[jira] [Updated] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-14 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-6465:
---

Attachment: throughput.png
99th_latency.png

Attached are two graphs of throughput and 99th percentile latencies for four 
runs of stress.  Two runs kept the time penalty in DES, and two had it removed. 
 There was a normal stress read of 3 million rows with and without the time 
penalty, and a second run where one of the three nodes was suspended 30 seconds 
into the run and resumed 60 seconds into the run.

In short, there's no difference in throughput or median/95th/99th latencies 
when a node goes down with the time penalty removed, so it looks like rapid 
read protection does indeed dominate there.

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: 99th_latency.png, des-score-graph.png, 
 des.sample.15min.csv, get-scores.py, throughput.png


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-14 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-6465:
---

Attachment: 6465-v1.patch

6465-v1.patch (and 
[branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-6465]) removes the 
timePenalty component from the DES score.

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: 6465-v1.patch, 99th_latency.png, des-score-graph.png, 
 des.sample.15min.csv, get-scores.py, throughput.png


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-14 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-6465:
---

Attachment: des-scores-with-penalty.csv
des-scores-without-penalty.csv

Attached are the DES scores from a run with and without the time penalty.  This 
was done with a three node CCM cluster. node1 coordinated all reads, and node2 
and node3 were the replicas for all reads.  In both runs, node2 served most of 
the reads (as reported by cfstats).

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: 6465-v1.patch, 99th_latency.png, des-score-graph.png, 
 des-scores-with-penalty.csv, des-scores-without-penalty.csv, 
 des.sample.15min.csv, get-scores.py, throughput.png


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6465:
--

Since Version: 1.2.0

This was introduced by CASSANDRA-3722.  It's not clear to me what that code is 
trying to do.  Or maybe I'm still grumpy about calling i/o activity severity.

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2013-12-09 Thread Chris Burroughs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Burroughs updated CASSANDRA-6465:
---

Attachment: des-score-graph.png
des.sample.15min.csv
get-scores.py

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2013-12-09 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-6465:
--

 Priority: Minor  (was: Major)
Fix Version/s: 2.0.4
 Assignee: Tyler Hobbs
   Labels: gossip  (was: )

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.4

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)