[jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-14 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871089#comment-13871089
 ] 

Brandon Williams commented on CASSANDRA-6465:
-

Can we get some numbers on score fluctuation with the time penalty removed to 
be certain this fixes it?

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: 6465-v1.patch, 99th_latency.png, des-score-graph.png, 
 des.sample.15min.csv, get-scores.py, throughput.png


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-10 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868234#comment-13868234
 ] 

Tyler Hobbs commented on CASSANDRA-6465:


[~ianbarfield] thanks for the analysis, you make some excellent observations.

From the discussion in CASSANDRA-3722, it seems like the two motivations for 
the time penalty were these:
# When a node dies, the FD will not mark it down for a while; in the meantime, 
we'd like to stop sending queries to it
# In a multi-DC setup, we would like to penalize the remote DC, but not so much 
that we won't ever use it when local nodes become very slow

I suspect that rapid read protection (CASSANDRA-4705) does a good job of 
mitigating the #1 case until the FD marks the node down.  I'll do some testing 
to confirm this.

I don't feel like the #2 case needs special treatment from the dynamic snitch, 
especially with the badness_threshold in effect.  Latency to the remote DC 
should prevent it from being used under normal circumstances.  If users really 
want to guarantee that, the LOCAL consistency levels are always available.

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-10 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868251#comment-13868251
 ] 

Brandon Williams commented on CASSANDRA-6465:
-

The best way to test #1 is to run in foreground mode and then suspend (^Z) the 
JVM.

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-06 Thread Ian Barfield (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863309#comment-13863309
 ] 

Ian Barfield commented on CASSANDRA-6465:
-

I believe the purpose of time penalty was to more quickly detect problematic 
nodes. If a node was suddenly suffering severe issues, that wouldn't be 
reflected in its latency metric until the current outstanding queries resolved. 
That might take until the maximum duration timeout which can be arbitrarily 
long, and in many cases is a lot longer than you'd like. By using timeDelay, 
the snitch can somewhat immediately penalize problem nodes since the queries do 
not have to timeout first. That said, it has numerous flaws both conceptually 
and in its implementation.

I was working on this problem a couple weeks ago, but have been distracted 
since, so I might not be able to give the best summary. Here's a couple issues 
off the top of my head though:
- if the time delay values are low, then high jitter throws the scores way off. 
It isn't unreasonable to expect situations where the time delay shifts 
semi-randomly between 0 and 1 ms. This means very little in terms of whether a 
node is a suitable target but can cause a drastic difference in scores if there 
is no slow node to anchor the scores.
- if the node response periods aren't low; say they average around 50 ms. Then 
by definition they are highly random since the score could be calculated at any 
point along 0 to 50 ms.
- it has a lot of complex interactions outside of its original scope of 
detecting bad nodes
- when calculating scores, if there is no lastReceived value for a node (eg. 
the node has just been added to the cluster), then the logic defaults to using 
the current time (essentially 0 or maximum 'good'). You might instead take the 
view that an unproven, cache-cold node would be a bad selection.
- sensitive to local noise. Each time the score is calculated, the timePenalty 
is calculated fresh. Since there is no concept of persistance or scope, events 
that corrupt the scoring process are extra harmful. eg. GC, CPU load / thread 
scheduling, and concurrency shenanigans occuring between the lastReceived.get() 
and System.currentTimeMillis()

Some of these issues are somewhat alleviated by the switch to using nanos, and 
I've been tempted to back port that for this class at least for testing, but 
this logic fails in complex ways. I think at some point I was able to confirm 
some wildly fluctuating values of the subcomponents to the scores (specifically 
timePenalty) by checking the mbeans and working under the assumption that 
timePenalty was likely the only component to well rounded scores -- if you have 
at least one node with  timePenalty then it gets cut off to 
UPDATE_INTERVAL_IN_MS which as a divisor makes for nicely formed floating point 
numbers.

There are also a lot of issues with the other score components, and some of the 
overall logic, but... some other time. Apologies if i've gotten something quite 
wrong; I've never really used Cassandra.

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and 

[jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2014-01-02 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861012#comment-13861012
 ] 

Tyler Hobbs commented on CASSANDRA-6465:


I can reproduce Chris's results, and in my experimentation it looks like almost 
all of the variation is due to the timePenalty, which is basically how long 
it has been since the last entry from an endpoint.  I can see why something 
like the time penalty might be useful for the phi FD, which expects messages on 
a periodic basis, but it doesn't make sense to me to use it in a load balancing 
measure.  My suggestion would be to remove the time penalty.

bq. Are we sure that this mechanism of producing cache pinning is worth the 
complexity here, especially given speculative execution?

Effective cache utilization is extremely important, so I would say it's well 
worth the additional complexity.  I don't think speculative execution should 
affect this greatly, but I might be missing something; care to expand on that?

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.5

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6465) DES scores fluctuate too much for cache pinning

2013-12-16 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849949#comment-13849949
 ] 

Robert Coli commented on CASSANDRA-6465:


Are we sure that this mechanism of producing cache pinning is worth the 
complexity here, especially given speculative retry? 

 DES scores fluctuate too much for cache pinning
 ---

 Key: CASSANDRA-6465
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6465
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 1.2.11, 2 DC cluster
Reporter: Chris Burroughs
Assignee: Tyler Hobbs
Priority: Minor
  Labels: gossip
 Fix For: 2.0.4

 Attachments: des-score-graph.png, des.sample.15min.csv, get-scores.py


 To quote the conf:
 {noformat}
 # if set greater than zero and read_repair_chance is  1.0, this will allow
 # 'pinning' of replicas to hosts in order to increase cache capacity.
 # The badness threshold will control how much worse the pinned host has to be
 # before the dynamic snitch will prefer other replicas over it.  This is
 # expressed as a double which represents a percentage.  Thus, a value of
 # 0.2 means Cassandra would continue to prefer the static snitch values
 # until the pinned host was 20% worse than the fastest.
 dynamic_snitch_badness_threshold: 0.1
 {noformat}
 An assumption of this feature is that scores will vary by less than 
 dynamic_snitch_badness_threshold during normal operations.  Attached is the 
 result of polling a node for the scores of 6 different endpoints at 1 Hz for 
 15 minutes.  The endpoints to sample were chosen with `nodetool getendpoints` 
 for row that is known to get reads.  The node was acting as a coordinator for 
 a few hundred req/second, so it should have sufficient data to work with.  
 Other traces on a second cluster have produced similar results.
  * The scores vary by far more than I would expect, as show by the difficulty 
 of seeing anything useful in that graph.
  * The difference between the best and next-best score is usually  10% 
 (default dynamic_snitch_badness_threshold).
 Neither ClientRequest nor ColumFamily metrics showed wild changes during the 
 data gathering period.
 Attachments:
  * jython script cobbled together to gather the data (based on work on the 
 mailing list from Maki Watanabe a while back)
  * csv of DES scores for 6 endpoints, polled about once a second
  * Attempt at making a graph



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)