[jira] [Comment Edited] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372232#comment-16372232 ] Dikang Gu edited comment on CASSANDRA-14252 at 2/27/18 5:19 AM: |[3.0 | https://github.com/DikangGu/cassandra/commit/163ad12c7c84cfd44932b814ce61c7a0145e9bcd]|[3.11 | https://github.com/DikangGu/cassandra/commit/980a7c36d410448b3714111b541193b68c952d34]|[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/20]| was (Author: dikanggu): |[3.0 | https://github.com/DikangGu/cassandra/commit/163ad12c7c84cfd44932b814ce61c7a0145e9bcd]|[3.11 | https://github.com/DikangGu/cassandra/commit/980a7c36d410448b3714111b541193b68c952d34]|[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/14]| > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372232#comment-16372232 ] Dikang Gu edited comment on CASSANDRA-14252 at 2/22/18 10:38 PM: - |[3.0 | https://github.com/DikangGu/cassandra/commit/163ad12c7c84cfd44932b814ce61c7a0145e9bcd]|[3.11 | https://github.com/DikangGu/cassandra/commit/980a7c36d410448b3714111b541193b68c952d34]|[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/14]| was (Author: dikanggu): |[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/14]| > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.11.x > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372232#comment-16372232 ] Dikang Gu edited comment on CASSANDRA-14252 at 2/22/18 10:31 PM: - |[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/14]| was (Author: dikanggu): |[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/13]| > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.11.x > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372232#comment-16372232 ] Dikang Gu edited comment on CASSANDRA-14252 at 2/22/18 10:30 PM: - |[trunk | https://github.com/DikangGu/cassandra/commit/fd00c51321f6252294b066a758bfaaa5ba810ea9]|[unit test | https://circleci.com/gh/DikangGu/cassandra/13]| was (Author: dikanggu): |[patch | https://github.com/DikangGu/cassandra/commit/6cfa9f1ee05a76dd4b510bff6f8bea347d118d42]|[unit test | https://circleci.com/gh/DikangGu/cassandra/9]| > Use zero as default score in DynamicEndpointSnitch > -- > > Key: CASSANDRA-14252 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14252 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > Fix For: 4.0, 3.11.x > > > The problem I want to solve is that I found in our deployment, one slow but > alive data node can slow down the whole cluster, even caused timeout of our > requests. > We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the > DynamicEndpointSnitch switch to sortByProximityWithScore, if local data node > latency is too high. > I added some debug log, and figured out that in a lot of cases, the score > from remote data node was not populated, so the fallback to > sortByProximityWithScore never happened. That's why a single slow data node, > can cause huge problems to the whole cluster. > In this jira, I'd like to use zero as default score, so that we will get a > chance to try remote data node, if local one is slow. > I tested it in our test cluster, it improved the client latency in single > slow data node case significantly. > I flag this as a Bug, because it caused problems to our use cases multiple > times. > logs === > _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [0.0]_ > _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: > sortByProximityWithBadness: after sorting by proximity, addresses order > change to [ip1, ip2], with scores [1.0]_ > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org