[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142227#comment-14142227 ] Peter Bailis commented on CASSANDRA-7056: - bq. Let's assume we query from partition A and B, and we see the results don't match timestamps, we would pull the latest batchlog assuming they are from the same batch but let's say they in fact are not. In this case we wasted a lot of time so my question is should we only do this in the user supplies a new CL type? If you set the same, unique (e.g., UUID) write timestamp for all writes in a batch, then you know that any results with different timestamps are part of different batches. So, given mismatched timestamps, should you check the batchlog for pending writes? One solution is to always check (as in RAMP-Small). This doesn't require any extra metadata, but, as you point out, also requires 2 RTTs. To cut down on these RTTs, you could also do attach a Bloom filter of the items in each batch and only check any possibly missing writes (as in RAMP-Hybrid). (I can go into more detail if you want.) However, I agree that you might not want to pay these costs *all* of the time for reads. Would a BATCH_READ or other modifier to CQL SELECT statements make sense? bq. In the case of a global index we plan on reading the data after reading the index. The data query might reveal the indexed value is stale. We would need to apply the batchlog and fix the index, would we then restart the entire query? or maybe overquery assuming some index values will be stale? Either way this query looks different than the above scenario. I think there are a few options. The easiest is to simply filter out the out of date rows, and then you are guaranteed to see a subset of the index entries. Alternatively, you could provide a snapshot index read where you read the older, overwritten values from the data node. If you want a read latest and read snapshot mode, there are some options I can describe, but they generally entail either more metadata or, otherwise, using locks/blocking coordination, which I don't think you want. Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048119#comment-14048119 ] Peter Bailis commented on CASSANDRA-7056: - bq. I doubt this will be dramatically more complex, but the approach to implementation is fundamentally different. It seems to me supporting transactions of arbitrary size is an equally powerful win to consistent transactions. I agree streaming batches could be really useful. In effect, you're turning an operation you'd have to perform client-side (e.g., you can simulate streaming by simply buffering your write sets and then calling one big BATCH) into a server-assisted one (where your proposed read-buffer/memtable stores the pending inserts while you're still deciding what goes into the transaction). From the RAMP perspective, this doesn't change things substantially -- you just have to make sure to propagate the appropriate txn metadata after you've determined what writes made it into the batch. [~benedict]: towards your point on non-QUORUM but QUORUM reads, I agree there are some cool tricks to play. There's some additional complexity in these optimizations, but, the basic observation is a good one: if I already have a transaction ID I want to read from and the metadata associated with it, all I have to do is find the matching versions which don't necessarily require QUORUM reads for consistency w.r.t. the ID. Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771 ] Peter Bailis commented on CASSANDRA-7056: - RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed---as you point out---never match, but simply returning matching timestamps is *not* not the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771 ] Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:16 PM: -- RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed--as you point out--never match, but simply returning matching timestamps is *not* not the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) was (Author: pbailis): RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed---as you point out---never match, but simply returning matching timestamps is *not* not the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771 ] Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:18 PM: -- RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) not match, but simply returning matching timestamps is *not* the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) was (Author: pbailis): RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) never match, but simply returning matching timestamps is *not* the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771 ] Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:16 PM: -- RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) never match, but simply returning matching timestamps is *not* the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) was (Author: pbailis): RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) never match, but simply returning matching timestamps is *not* not the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771 ] Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:16 PM: -- RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) never match, but simply returning matching timestamps is *not* not the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) was (Author: pbailis): RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed--as you point out--never match, but simply returning matching timestamps is *not* not the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771 ] Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:19 PM: -- RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) not match, but simply returning matching timestamps is *not* the goal: the goal is that if you read any write in a given batch, you will be able to read the rest of the writes in the batch (i.e., if you also attempt to read any other items that were written in the batch, you will see the corresponding writes). was (Author: pbailis): RAMP has a requirement that anything being read/written that way is always written in the same groupings. If you update B,C and then update A,B. You can't read B,C anymore successfully, as the times on B and C will never match. This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) not match, but simply returning matching timestamps is *not* the goal: the goal is that if you read any write in a given batch, you will read the rest of the writes in the batch (to the items you requested in the batch read) Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043851#comment-14043851 ] Peter Bailis commented on CASSANDRA-7056: - [~jjordan] Good question. The short answer is that this behavior (reading A @2 and C@1) is well-defined under RAMP. Just like in Cassandra today, the fact that I read a write at time 500 doesn't mean I'm going to see the effects of all writes that occur before time 500! Rather, the guarantee that RAMP adds is that, once you see the effects of one write in the the batch, you'll see all of the writes in the batch. So, in your scenario, you have three batches: B1 {A=1, B=1} at time 1, B1.5 {B=1.5, C=1.5} at time 1.5, and B2 {A=2, B=2} at time 2. You could get the behavior you describe above if B1 executes and completes, B2 executes and complete, and we subsequently read sometime before B1.5 completes. So, I guess I disagree that the real C you should be getting is the one from [the batch at time 1.5] because you didn't yet see the effect of any writes from B1.5. However, once B1.5 completes, you *will* be guaranteed to read C at time 1.5. It may be easier to think of RAMP as providing the ability to take each of your normal reads and writes under LWW and turn them into multi-column, multi-table writes that are all going to be visible/reflected in the table state (once completed). There's no special ordering guarantees beyond what Cassandra already provides; if you need strong ordering guarantees (e.g., enforcing sequential assignment of timestamps), it's a case for CAS. Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions
[ https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042848#comment-14042848 ] Peter Bailis commented on CASSANDRA-7056: - As I mentioned at the Next-Generation Cassandra Conference, I'm happy to get the ball rolling on an implementation of RAMP in Cassandra. To reiterate a few points from the NGCC, I think RAMP could provide some useful isolation guarantees for Cassandra's Atomic Batch operations (either none of the updates will be visible, or all are visible) as well as provide the basis for consistent global secondary index updates in Cassandra-6477. I've posted my slides from the NGCC on SpeakerDeck; the Cassandra-specific implementation details start on transition number 287. https://speakerdeck.com/pbailis/scalable-atomic-visibility-with-ramp-transactions I have some time to hack on this and am willing to work on a patch and/or hammer out the Cassandra-specific design with you all over JIRA or otherwise! Add RAMP transactions - Key: CASSANDRA-7056 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056 Project: Cassandra Issue Type: Wish Components: Core Reporter: Tupshin Harper Priority: Minor We should take a look at [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/] transactions, and figure out if they can be used to provide more efficient LWT (or LWT-like) operations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6477) Global indexes
[ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042865#comment-14042865 ] Peter Bailis commented on CASSANDRA-6477: - Just to follow up regarding our conversations at the NGCC: is there any interest at this point in delivering any form of consistent index updates (e.g., beyond eventually consistent index entries), or is the primary goal right now simply to get basic global index functionality working? Also, FWIW, though I have yet to think through [~benedict]'s second proposal, the local CAS approach makes a lot of sense from my perspective insofar as you're willing to tolerate RF-1 redundant (but, importantly, idempotent!) index invalidations. I think this will work especially well when, per [~tjake]'s proposal, you start partitioning the secondary index servers. Global indexes -- Key: CASSANDRA-6477 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477 Project: Cassandra Issue Type: New Feature Components: API, Core Reporter: Jonathan Ellis Fix For: 3.0 Local indexes are suitable for low-cardinality data, where spreading the index across the cluster is a Good Thing. However, for high-cardinality data, local indexes require querying most nodes in the cluster even if only a handful of rows is returned. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668054#comment-13668054 ] Peter Bailis commented on CASSANDRA-5455: - bq. Do we need any core changes at all, then? (Under the #3 for now plan.) Nope; the predictor I linked uses the per-CF latency metrics. The downside is accuracy. Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 2.0 Attachments: 5455.txt It was a fun experiment, but it's unmaintained and the bar to understanding what is going on is high. Case in point: PBSTest has been failing intermittently for some time now, possibly even since it was created. Or possibly not and it was a regression from a refactoring we did. Who knows? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666888#comment-13666888 ] Peter Bailis commented on CASSANDRA-5455: - Okay. #1 will likely require more extensive code changes: basically, it'll require EstimatedHistograms for each of the servers acting as replicas for a given ColumnFamily and will require EstimatedHistogram tracing in the StorageProxy (to separate network-based latency from disk-based latency). Are these changes feasible? re: a window of individual latency times, looking at the Metrics implementation of EstimatedHistogram, EstimatedHistogram.values() should provide a reasonable enough sample (especially since, as you mention, since it has other uses as well). Perhaps the simplest strategy is to go with #3 for now but implement #1 in the future if there's interest. #3 is easy; I've already written an example external module to do RTT/2 predictions: https://github.com/pbailis/pbs-predictor/blob/9d31acd1667b08affa609278689b540d8e0380f5/pbspredictor/src/main/java/edu/berkeley/pbs/cassandra/CassandraLatencyTrace.java Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 2.0 Attachments: 5455.txt It was a fun experiment, but it's unmaintained and the bar to understanding what is going on is high. Case in point: PBSTest has been failing intermittently for some time now, possibly even since it was created. Or possibly not and it was a regression from a refactoring we did. Who knows? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657705#comment-13657705 ] Peter Bailis commented on CASSANDRA-5455: - I've thought some more about different options for enabling metrics that are useful to both PBS (in an external module, if committers prefer) and anyone else who would be interested in finer-grained tracing. To start, I *do* think that there is interest in a PBS module: if an eventually consistent store is returning stale data, how stale *is* it? Especially given that many (most?) Cassandra client libraries (including the Datastax java-driver) choose CL=ONE by default, I'd expect most users would prefer to understand how their choice of N,R, and W affects their latency and consistency. I've been contacted by several Cassandra users who are interested in and/or using this functionality and understand that several developers are interested in PBS for Riak (notably, Andy Gross highlighted PBS in his 2013 RICON East keynote as a useful feature Basho would like). We originally chose Cassandra based on our familiarity with the code base and on early discussions with Jonathan but we plan to integrate PBS functionality into Riak with the help of their committers in the near-term future. So I do think there is interest, and, if you're curious about *use cases* for this functionality, Shivaram and I will be demoing PBS in Cassandra 1.2 at the upcoming SIGMOD 2013 conference. Our demo proposal sketches three application vignettes, including the obvious integration with monitoring tools but also automatically tuning N,R, and W and and providing consistency and latency SLAs: http://www.bailis.org/papers/pbs-demo-sigmod2013.pdf So, on the more technical side, there are two statistics that aren't currently measured (in trunk) that are required for accurate PBS predictions. First, PBS requires per-server statistics. Currently, the ColumnFamily RTT read/write latency metrics are aggregated across all servers. Second, PBS requires a measure how how long a read/write request takes before it is processed (i.e., how long it took from a client sending each read/write request to when it was performed). This requires knowledge of one-way request latencies as well as read/write request-specific logic. The 1.2 PBS patch provided both of these, aggregating by server and measuring the delay until processing. As Jonathan notes above, the latter measurement was conservative--the remote replica recorded the time that it enqueued its response rather than the exact moment a read or write was performed, namely for simplicity of code. The coordinating server could then closely approximate the return time as RTT-(remote timestamp). Given these requirements and the current state of trunk, there are a few ways forward to support an external PBS prediction module: 1a.) Modify Cassandra to store latency statistics on a per-server and per-ColumnFamily granularity. As Rick Branson has pointed out, this is actually useful for monitoring other than PBS and can be used to detect slower replicas. 1b.) Modify Cassandra to store local processing times for requests (i.e., expand StorageMetrics, which currently does not track the time required to, say, fulfill a local read stage). This also has the benefit of understanding whether a Cassandra node is slow due to network or disk. 2.) Use the newly developed tracing functionality to reconstruct latencies for selected requests. Performing any sort of profiling will require tracing to be enabled (this appears to be somewhat heavyweight given the amount of data that is logged for each request , and reconstructing latencies from the trace table may be expensive (i.e., amount to a many-way self-join). 3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already supported external predictor. 4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code. Expose the latency samples via an Mbean for users like Rick who would benefit from it. Proposal #1 has benefits for many users and seems a natural extension to the existing metrics but requires changes to the existing code. Proposal #2 puts substantial burden on an end-user and, without a fixed schema for the trace table, may amount to a fair bit of code munging. Proposal #3 is inaccurate but works on trunk. Proposal #4 is essentially 1.2.0 without the requirement to maintain any PBS-specific code and is a reasonable stop-gap before proposal #1. All of these proposals are amenable to sampling. I'd welcome your feedback on these proposals and next steps. Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan
[jira] [Comment Edited] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657705#comment-13657705 ] Peter Bailis edited comment on CASSANDRA-5455 at 5/14/13 11:48 PM: --- I've thought some more about different options for enabling metrics that are useful to both PBS (in an external module, if committers prefer) and anyone else who would be interested in finer-grained tracing. To start, I *do* think that there is interest in a PBS module: if an eventually consistent store is returning stale data, how stale *is* it? Especially given that many (most?) Cassandra client libraries (including the Datastax java-driver) choose CL=ONE by default, I'd expect most users would prefer to understand how their choice of N,R, and W affects their latency and consistency. I've been contacted by several Cassandra users who are interested in and/or using this functionality and understand that several developers are interested in PBS for Riak (notably, Andy Gross highlighted PBS in his 2013 RICON East keynote as a useful feature Basho would like). We originally chose Cassandra based on our familiarity with the code base and on early discussions with Jonathan but we plan to integrate PBS functionality into Riak with the help of their committers in the near-term future. So I do think there is interest, and, if you're curious about *use cases* for this functionality, Shivaram and I will be demoing PBS in Cassandra 1.2 at the upcoming SIGMOD 2013 conference. Our demo proposal sketches three application vignettes, including the obvious integration with monitoring tools but also automatically tuning N,R, and W and and providing consistency and latency SLAs: http://www.bailis.org/papers/pbs-demo-sigmod2013.pdf So, on the more technical side, there are two statistics that aren't currently measured (in trunk) that are required for accurate PBS predictions. First, PBS requires per-server statistics. Currently, the ColumnFamily RTT read/write latency metrics are aggregated across all servers. Second, PBS requires a measure how how long a read/write request takes before it is processed (i.e., how long it took from a client sending each read/write request to when it was performed). This requires knowledge of one-way request latencies as well as read/write request-specific logic. The 1.2 PBS patch provided both of these, aggregating by server and measuring the delay until processing. As Jonathan notes above, the latter measurement was conservative; the remote replica recorded the time that it enqueued its response rather than the exact moment a read or write was performed, namely for simplicity of code. The coordinating server could then closely approximate the return time as RTT-(remote timestamp). Given these requirements and the current state of trunk, there are a few ways forward to support an external PBS prediction module: 1a.) Modify Cassandra to store latency statistics on a per-server and per-ColumnFamily granularity. As Rick Branson has pointed out, this is actually useful for monitoring other than PBS and can be used to detect slower replicas. 1b.) Modify Cassandra to store local processing times for requests (i.e., expand StorageMetrics, which currently does not track the time required to, say, fulfill a local read stage). This also has the benefit of understanding whether a Cassandra node is slow due to network or disk. 2.) Use the newly developed tracing functionality to reconstruct latencies for selected requests. Performing any sort of profiling will require tracing to be enabled (this appears to be somewhat heavyweight given the amount of data that is logged for each request , and reconstructing latencies from the trace table may be expensive (i.e., amount to a many-way self-join). 3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already supported external predictor. 4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code. Expose the latency samples via an Mbean for users like Rick who would benefit from it. Proposal #1 has benefits for many users and seems a natural extension to the existing metrics but requires changes to the existing code. Proposal #2 puts substantial burden on an end-user and, without a fixed schema for the trace table, may amount to a fair bit of code munging. Proposal #3 is inaccurate but works on trunk. Proposal #4 is essentially 1.2.0 without the requirement to maintain any PBS-specific code and is a reasonable stop-gap before proposal #1. All of these proposals are amenable to sampling. I'd welcome your feedback on these proposals and next steps. was (Author: pbailis): I've thought some more about different options for enabling metrics that are useful to both PBS (in an external module, if committers prefer) and anyone else
[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654669#comment-13654669 ] Peter Bailis commented on CASSANDRA-5455: - I am one of the original authors of CASSANDRA-4261 and was previously unaware of this change. I'm happy to make any changes to the tests, perform necessary code refactoring, or write additional documentation (but was unable to do so given the window between ticket creation and commit). That is, I will maintain this functionality given the opportunity to do so. Could you please elaborate on what you'd like to see fixed? I suspect they'll be fairly straightforward, and, if anyone knows how to fix them, I (and Shivaram) probably do. If the answer is that we don't want this functionality, then that's a different case. But that's not what I'm getting from this ticket or CASSANDRA-4261 or hearing from users. Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 2.0 Attachments: 5455.txt It was a fun experiment, but it's unmaintained and the bar to understanding what is going on is high. Case in point: PBSTest has been failing intermittently for some time now, possibly even since it was created. Or possibly not and it was a regression from a refactoring we did. Who knows? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654669#comment-13654669 ] Peter Bailis edited comment on CASSANDRA-5455 at 5/10/13 5:58 PM: -- I am one of the original authors of CASSANDRA-4261 and was previously unaware of this change. I'm happy to make any changes to the tests, perform necessary code refactoring, or write additional documentation (but was unable to do so given the window between ticket creation and commit). That is, I will maintain this functionality given the opportunity to do so. Could you please elaborate on what you'd like to see fixed? I suspect they'll be fairly straightforward, and, if anyone knows how to fix them, I (and Shivaram) probably do. If the answer is that we don't want this functionality, then that's a different case. But that's not what I'm getting from this ticket or CASSANDRA-4261 or am hearing from users. was (Author: pbailis): I am one of the original authors of CASSANDRA-4261 and was previously unaware of this change. I'm happy to make any changes to the tests, perform necessary code refactoring, or write additional documentation (but was unable to do so given the window between ticket creation and commit). That is, I will maintain this functionality given the opportunity to do so. Could you please elaborate on what you'd like to see fixed? I suspect they'll be fairly straightforward, and, if anyone knows how to fix them, I (and Shivaram) probably do. If the answer is that we don't want this functionality, then that's a different case. But that's not what I'm getting from this ticket or CASSANDRA-4261 or hearing from users. Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 2.0 Attachments: 5455.txt It was a fun experiment, but it's unmaintained and the bar to understanding what is going on is high. Case in point: PBSTest has been failing intermittently for some time now, possibly even since it was created. Or possibly not and it was a regression from a refactoring we did. Who knows? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654669#comment-13654669 ] Peter Bailis edited comment on CASSANDRA-5455 at 5/10/13 5:59 PM: -- I am one of the original authors of CASSANDRA-4261 and was previously unaware of this change. I'm happy to make any changes to the tests, perform necessary code refactoring, or write additional documentation (but was unable to do so given the window between ticket creation and commit). That is, I will maintain this functionality given the opportunity to do so. Could you please elaborate on what you'd like to see fixed? I suspect it'll be fairly straightforward, and, if anyone knows how to make the changes, I (and Shivaram) probably do. If the answer is that we don't want this functionality, then that's a different case. But that's not what I'm getting from this ticket or CASSANDRA-4261 or am hearing from users. was (Author: pbailis): I am one of the original authors of CASSANDRA-4261 and was previously unaware of this change. I'm happy to make any changes to the tests, perform necessary code refactoring, or write additional documentation (but was unable to do so given the window between ticket creation and commit). That is, I will maintain this functionality given the opportunity to do so. Could you please elaborate on what you'd like to see fixed? I suspect they'll be fairly straightforward, and, if anyone knows how to fix them, I (and Shivaram) probably do. If the answer is that we don't want this functionality, then that's a different case. But that's not what I'm getting from this ticket or CASSANDRA-4261 or am hearing from users. Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 2.0 Attachments: 5455.txt It was a fun experiment, but it's unmaintained and the bar to understanding what is going on is high. Case in point: PBSTest has been failing intermittently for some time now, possibly even since it was created. Or possibly not and it was a regression from a refactoring we did. Who knows? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor
[ https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654761#comment-13654761 ] Peter Bailis commented on CASSANDRA-5455: - I don't believe that the StorageProxy tracks the latencies according to the same granularity. For example, the PBS latency tracking will record both how long it took for the request to reach a remote replica and be processed as well as how long the return trip takes. That said, it shouldn't be too difficult to either 1.) simply expose the recorded latencies via an optional module providing a finer granularity tracing interface via JMX [thereby removing all actual prediction code but keeping the logging in place for folks who might want this] or 2.) modifying StorageProxy to log these latencies in addition to the coarser granularity measurements it already takes. I can provide assistance with either. Remove PBSPredictor --- Key: CASSANDRA-5455 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Jonathan Ellis Fix For: 2.0 Attachments: 5455.txt It was a fun experiment, but it's unmaintained and the bar to understanding what is going on is high. Case in point: PBSTest has been failing intermittently for some time now, possibly even since it was created. Or possibly not and it was a regression from a refactoring we did. Who knows? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457525#comment-13457525 ] Peter Bailis commented on CASSANDRA-4261: - Jonathan, Thanks for the rebase! Looking at the updated code, we can still log the start of the operation in MessagingService.sendRR() but move the reply timestamp logging from the ResponseVerbHandler to MessagingService.receive(). This won't be too bad, and we can filter the MessageIn instances passed to PBSPredictor by both the verb type and/or by id. Does that make sense? Also, re: CASSANDRA-4009, it should be possible to use this code, but there are two issues: 1.) We need finer-granularity tracing than what is currently implemented. We need to know how long it takes to hit a given node and not just the end-to-end round-trip latencies. 2.) Using a histogram instead of keeping around the actual latencies will reduce the fidelity of the predictions. The impact of this depends on the bucket size and distribution. Let us know what you think! [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2.0 beta 1 Reporter: Peter Bailis Attachments: 4261-v4.txt, demo-pbs-v3.sh, pbs-nodetool-v3.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible
[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437434#comment-13437434 ] Peter Bailis commented on CASSANDRA-4261: - Is there anything else you'd like to have us do for the patch? [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2.0 Reporter: Peter Bailis Attachments: demo-pbs-v3.sh, pbs-nodetool-v3.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by calling {{enableConsistencyPredictionLogging()}} in the {{PBSPredictorMBean}}. Cassandra logs a variable number of latencies (controllable via JMX ({{setMaxLoggedLatenciesForConsistencyPrediction(int maxLogged)}}, default: 1). Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by calling {{enableConsistencyPredictionLogging()}} in the {{PBSPredictorMBean}}. Cassandra logs a variable number of latencies (controllable via JMX ({{setMaxLoggedLatenciesForConsistencyPrediction(int maxLogged)}}, default: 1). Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{setNumberTrialsForConsistencyPrediction(int numTrials)}} Monte Carlo
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: (was: pbs-nodetool-v2.patch) [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v3.sh, pbs-nodetool-v3.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: (was: demo-pbs-v2.sh) [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v3.sh, pbs-nodetool-v3.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform
[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291541#comment-13291541 ] Peter Bailis commented on CASSANDRA-4261: - I agree that JMX would better. I'll work on changing this configuration and will post performance numbers shortly. I should be able to have this done in a week or so (latency due to my schedule, not due to task difficulty). [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will
[jira] [Comment Edited] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291541#comment-13291541 ] Peter Bailis edited comment on CASSANDRA-4261 at 6/8/12 4:07 AM: - I agree that JMX would work better. I'll work on changing this configuration and will post performance numbers shortly. I should be able to have this done in a week or so (latency due to my schedule, not due to task difficulty). was (Author: pbailis): I agree that JMX would better. I'll work on changing this configuration and will post performance numbers shortly. I should be able to have this done in a week or so (latency due to my schedule, not due to task difficulty). [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation
[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290801#comment-13290801 ] Peter Bailis commented on CASSANDRA-4261: - re: performance, we haven't noticed anything, but we also haven't done much serious load testing. I agree that there shouldn't be much overhead, and the only thing I can think of possibly being a problem would be contention in the ConcurrentHashMap that maps requestIDs to lists of latencies. However, this *really* shouldn't be a problem. To quantify this, I can run and report numbers for something like stress on an EC2 cluster. Would that work? Are there existing performance regression tests? If you have a preference for a different workload or configuration, let me know. re: the other config file settings, max_logged_latencies_for_consistency_prediction is possibly useful. Because we use a LRU policy for the latency logging, the number of latencies logged indirectly determines the window of time for sampling. If you want to capture a longer trace of network behavior, you'd increase the window, and if you wanted to do some on-the-fly tuning, you might shorten it. However, we could easily set this as a runtime configuration via nodetool instead. [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: pbs-nodetool-v2.patch Update to patch. Fixed a bug where, if two reads happen with the same latency, we make sure to treat them separately. This required a two-line change that effectively excludes reads we've considered for a given trial. Also added a check in the test for this case. [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: (was: pbs-nodetool-v1.patch) [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: demo-pbs-v2.sh Updated hyperlink in demo script. [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v2.sh, demo-pbs.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: (was: demo-pbs.sh) [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 Performing consistency prediction 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.678900 Average read latency: 5.377900ms (99.900th %ile 40ms) Average write latency: 36.971298ms (99.900th %ile 294ms) N=3, R=1, W=2 Probability of consistent reads: 0.791600 Average read latency: 5.372500ms (99.900th %ile 39ms) Average write latency: 303.630890ms (99.900th %ile 357ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 5.426600ms (99.900th %ile 42ms) Average write latency: 1382.650879ms (99.900th %ile 629ms) N=3, R=2, W=1 Probability of consistent reads: 0.915800 Average read latency: 11.091000ms (99.900th %ile 348ms) Average write latency: 42.663101ms (99.900th %ile 284ms) N=3, R=2, W=2 Probability of consistent reads: 1.00 Average read latency: 10.606800ms (99.900th %ile 263ms) Average write latency: 310.117615ms (99.900th %ile 335ms) N=3, R=3, W=1 Probability of consistent reads: 1.00 Average read latency: 52.657501ms (99.900th %ile 565ms) Average write latency: 39.949799ms (99.900th %ile 237ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: demo-pbs.sh [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs.sh, pbs-nodetool-v1.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}.
[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281391#comment-13281391 ] Peter Bailis commented on CASSANDRA-4261: - I've provided a bash script that performs a full end-to-end demonstration of this patch, in case you didn't want to pull a clean source tree and patch it, then copy and paste the commands above. The script clones Cassandra trunk, applies the patch, then spins up and profiles at local 5 node cluster using ccm as above. The script isn't robust, but it should be easy enough to debug. Enjoy! [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: demo-pbs.sh, pbs-nodetool-v1.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to
[jira] [Created] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
Peter Bailis created CASSANDRA-4261: --- Summary: [Patch] Support consistency-latency prediction in nodetool Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis .h1 Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} .h1 Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} .h1 What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies (each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node) and then predicts the latency and consistency for each possible ConsistencyLevel setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these parameters, and the defaults work well.
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Attachment: pbs-nodetool-v1.patch Last commit to Cassandra fork for this patch is at https://github.com/pbailis/cassandra-pbs/commit/6e0ac68b43a7e6692423abf760edf88d633dd04d [Patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: pbs-nodetool-v1.patch .h1 Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} .h1 Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} .h1 What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Comment: was deleted (was: Last commit to Cassandra fork for this patch is at https://github.com/pbailis/cassandra-pbs/commit/6e0ac68b43a7e6692423abf760edf88d633dd04d) [Patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: pbs-nodetool-v1.patch h2. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h2. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h2. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}}
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h1. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h1. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h1. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies (each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node) and then predicts the latency and consistency for each possible ConsistencyLevel setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these parameters, and the defaults work well. The more latencies they log, the better the predictions will be. We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h2. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h2. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h2. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies (each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node) and then predicts the latency and consistency for each possible ConsistencyLevel setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these parameters, and the defaults work well. The more latencies they log, the better the predictions will be. We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}// {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies (each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node) and then predicts the latency and consistency for each possible ConsistencyLevel setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these parameters, and the defaults work well. The more latencies they log, the better the predictions will be. We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}// {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? nodetool predictconsistency provides this: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}\\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users shouldn't have to touch these
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch, exposed by {{nodetool predictconsistency}} provides answers: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch, exposed by {{nodetool predictconsistency}} provides answers: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch, exposed by {{nodetool predictconsistency}} provides answers: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but QUORUM is often slower than ONE, TWO, or THREE. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch, exposed by {{nodetool predictconsistency}} provides answers: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte Carlo trials per configuration. Users
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within nodetool. Users can accurately predict Cassandra behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch, exposed by {{nodetool predictconsistency}} provides answers: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}}
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running {{number_trials_for_consistency_prediction}} Monte
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch, exposed by {{nodetool predictconsistency}} provides answers: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than trying out different configurations (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Summary: [patch] Support consistency-latency prediction in nodetool (was: [Patch] Support consistency-latency prediction in nodetool) [patch] Support consistency-latency prediction in nodetool -- Key: CASSANDRA-4261 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261 Project: Cassandra Issue Type: New Feature Components: Tools Affects Versions: 1.2 Reporter: Peter Bailis Attachments: pbs-nodetool-v1.patch h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3.
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running
[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bailis updated CASSANDRA-4261: Description: h3. Introduction Cassandra supports a variety of replication configurations: ReplicationFactor is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or {{THREE}}. What should users choose? This patch provides a latency-consistency analysis within {{nodetool}}. Users can accurately predict Cassandra's behavior in their production environments without interfering with performance. What's the probability that we'll read a write t seconds after it completes? What about reading one of the last k writes? This patch provides answers via {{nodetool predictconsistency}}: {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}} \\ \\ {code:title=Example output|borderStyle=solid} //N == ReplicationFactor //R == read ConsistencyLevel //W == write ConsistencyLevel user@test:$ nodetool predictconsistency 3 100 1 100ms after a given write, with maximum version staleness of k=1 N=3, R=1, W=1 Probability of consistent reads: 0.811700 Average read latency: 6.896300ms (99.900th %ile 174ms) Average write latency: 8.788000ms (99.900th %ile 252ms) N=3, R=1, W=2 Probability of consistent reads: 0.867200 Average read latency: 6.818200ms (99.900th %ile 152ms) Average write latency: 33.226101ms (99.900th %ile 420ms) N=3, R=1, W=3 Probability of consistent reads: 1.00 Average read latency: 6.766800ms (99.900th %ile 111ms) Average write latency: 153.764999ms (99.900th %ile 969ms) N=3, R=2, W=1 Probability of consistent reads: 0.951500 Average read latency: 18.065800ms (99.900th %ile 414ms) Average write latency: 8.322600ms (99.900th %ile 232ms) N=3, R=2, W=2 Probability of consistent reads: 0.983000 Average read latency: 18.009001ms (99.900th %ile 387ms) Average write latency: 35.797100ms (99.900th %ile 478ms) N=3, R=3, W=1 Probability of consistent reads: 0.993900 Average read latency: 101.959702ms (99.900th %ile 1094ms) Average write latency: 8.518600ms (99.900th %ile 236ms) {code} h3. Demo Here's an example scenario you can run using [ccm|https://github.com/pcmanus/ccm]. The prediction is fast: {code:borderStyle=solid} cd cassandra-source-dir with patch applied ant # turn on consistency logging sed -i .bak 's/log_latencies_for_consistency_prediction: false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml ccm create consistencytest --cassandra-dir=. ccm populate -n 5 ccm start # if start fails, you might need to initialize more loopback interfaces # e.g., sudo ifconfig lo0 alias 127.0.0.2 # use stress to get some sample latency data tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1 {code} h3. What and Why We've implemented [Probabilistically Bounded Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting consistency-latency trade-offs within Cassandra. Our [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range of Dynamo-style data store deployments at places like LinkedIn and Yammer in addition to profiling our own Cassandra deployments. In our experience, prediction is both accurate and much more lightweight than profiling and manually testing each possible replication configuration (especially in production!). This analysis is important for the many users we've talked to and heard about who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime environment and, short of profiling in production, there's no existing way to answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, meaning users don't know how stale their data will be.) We outline limitations of the current approach after describing how it's done. We believe that this is a useful feature that can provide guidance and fairly accurate estimation for most users. h3. Interface This patch allows users to perform this prediction in production using {{nodetool}}. Users enable tracing of latency data by setting {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. Each latency is 8 bytes, and there are 4 distributions we require, so the space overhead is {{32*logged_latencies}} bytes of memory for the predicting node. {{nodetool predictconsistency}} predicts the latency and consistency for each possible {{ConsistencyLevel}} setting (reads and writes) by running