[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-11738: --- Attachment: 11738.txt Patch attached that allows manual Severity injection but removes the compaction component. I have left the external Severity API the same for compatibility purposes even though we could come up with a better fit now that we aren't trying to shoehorn two use cases into a single value. [~jjordan], can you review? > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jeremiah Jordan >Assignee: Jonathan Ellis >Priority: Minor > Fix For: 3.x > > Attachments: 11738.txt > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-11738: --- Assignee: Jonathan Ellis Priority: Minor (was: Major) Reviewer: Jeremiah Jordan (was: Robert Stupp) Component/s: Core > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Jeremiah Jordan >Assignee: Jonathan Ellis >Priority: Minor > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-11738: --- Assignee: (was: Jonathan Ellis) > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremiah Jordan > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-11738: -- Assignee: Jonathan Ellis > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremiah Jordan >Assignee: Jonathan Ellis > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-11738: Reviewer: Robert Stupp > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremiah Jordan >Assignee: Jonathan Ellis > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-11738: -- Issue Type: Improvement (was: Bug) > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Improvement >Reporter: Jeremiah Jordan > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11738) Re-think the use of Severity in the DynamicEndpointSnitch calculation
[ https://issues.apache.org/jira/browse/CASSANDRA-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-11738: Fix Version/s: 3.x > Re-think the use of Severity in the DynamicEndpointSnitch calculation > - > > Key: CASSANDRA-11738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11738 > Project: Cassandra > Issue Type: Bug >Reporter: Jeremiah Jordan > Fix For: 3.x > > > CASSANDRA-11737 was opened to allow completely disabling the use of severity > in the DynamicEndpointSnitch calculation, but that is a pretty big hammer. > There is probably something we can do to better use the score. > The issue seems to be that severity is given equal weight with latency in the > current code, also that severity is only based on disk io. If you have a > node that is CPU bound on something (say catching up on LCS compactions > because of bootstrap/repair/replace) the IO wait can be low, but the latency > to the node is high. > Some ideas I had are: > 1. Allowing a yaml parameter to tune how much impact the severity score has > in the calculation. > 2. Taking CPU load into account as well as IO Wait (this would probably help > in the cases I have seen things go sideways) > 3. Move the -D from CASSANDRA-11737 to being a yaml level setting > 4. Go back to just relying on Latency and get rid of severity all together. > Now that we have rapid read protection, maybe just using latency is enough, > as it can help where the predictive nature of IO wait would have been useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)