[GitHub] incubator-trafficcontrol pull request #772: Fix Traffic Monitor to startup w...
Github user rob05c commented on a diff in the pull request: https://github.com/apache/incubator-trafficcontrol/pull/772#discussion_r131534318 --- Diff: traffic_monitor_golang/traffic_monitor/threadsafe/polledcaches.go --- @@ -103,11 +116,22 @@ func copyCaches(a map[enum.CacheName]struct{}) map[enum.CacheName]struct{} { return b } +func copyCachesTime(a map[enum.CacheName]time.Time) map[enum.CacheName]time.Time { + b := map[enum.CacheName]time.Time{} + for k, v := range a { + b[k] = v + } + return b +} + +const PolledBytesPerSecTimeout = time.Second * 60 --- End diff -- Changed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-trafficcontrol pull request #772: Fix Traffic Monitor to startup w...
Github user mtorluemke commented on a diff in the pull request: https://github.com/apache/incubator-trafficcontrol/pull/772#discussion_r131261159 --- Diff: traffic_monitor_golang/traffic_monitor/threadsafe/polledcaches.go --- @@ -103,11 +116,22 @@ func copyCaches(a map[enum.CacheName]struct{}) map[enum.CacheName]struct{} { return b } +func copyCachesTime(a map[enum.CacheName]time.Time) map[enum.CacheName]time.Time { + b := map[enum.CacheName]time.Time{} + for k, v := range a { + b[k] = v + } + return b +} + +const PolledBytesPerSecTimeout = time.Second * 60 --- End diff -- "Trouble communicating with a cache" should, by itself, mark the cache as unhealthy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-trafficcontrol pull request #772: Fix Traffic Monitor to startup w...
Github user rob05c commented on a diff in the pull request: https://github.com/apache/incubator-trafficcontrol/pull/772#discussion_r131190134 --- Diff: traffic_monitor_golang/traffic_monitor/threadsafe/polledcaches.go --- @@ -103,11 +116,22 @@ func copyCaches(a map[enum.CacheName]struct{}) map[enum.CacheName]struct{} { return b } +func copyCachesTime(a map[enum.CacheName]time.Time) map[enum.CacheName]time.Time { + b := map[enum.CacheName]time.Time{} + for k, v := range a { + b[k] = v + } + return b +} + +const PolledBytesPerSecTimeout = time.Second * 60 --- End diff -- It makes me anxious that if we had trouble communicating with a cache which was overloaded with traffic for 10s, we'd tell the Router it's online and has 0 bytes/second and to send it lots of traffic. It'd take a bit more work to make it a config, that object in the code has no access to the config right now. I'm not a big fan of adding configs for every possible thing, though, every config option is more ops work and training. One doesn't seem like a big deal, but they add up. This should be a rare case; it just happens that we've allowed people to put a dysfunctional 'cache' on our CDN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-trafficcontrol pull request #772: Fix Traffic Monitor to startup w...
Github user dneuman64 commented on a diff in the pull request: https://github.com/apache/incubator-trafficcontrol/pull/772#discussion_r131174165 --- Diff: traffic_monitor_golang/traffic_monitor/threadsafe/polledcaches.go --- @@ -103,11 +116,22 @@ func copyCaches(a map[enum.CacheName]struct{}) map[enum.CacheName]struct{} { return b } +func copyCachesTime(a map[enum.CacheName]time.Time) map[enum.CacheName]time.Time { + b := map[enum.CacheName]time.Time{} + for k, v := range a { + b[k] = v + } + return b +} + +const PolledBytesPerSecTimeout = time.Second * 60 --- End diff -- Can we make it a config? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-trafficcontrol pull request #772: Fix Traffic Monitor to startup w...
Github user mtorluemke commented on a diff in the pull request: https://github.com/apache/incubator-trafficcontrol/pull/772#discussion_r131169393 --- Diff: traffic_monitor_golang/traffic_monitor/threadsafe/polledcaches.go --- @@ -103,11 +116,22 @@ func copyCaches(a map[enum.CacheName]struct{}) map[enum.CacheName]struct{} { return b } +func copyCachesTime(a map[enum.CacheName]time.Time) map[enum.CacheName]time.Time { + b := map[enum.CacheName]time.Time{} + for k, v := range a { + b[k] = v + } + return b +} + +const PolledBytesPerSecTimeout = time.Second * 60 --- End diff -- Good concept -- I would pick something closer to 10 seconds, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-trafficcontrol pull request #772: Fix Traffic Monitor to startup w...
GitHub user rob05c opened a pull request: https://github.com/apache/incubator-trafficcontrol/pull/772 Fix Traffic Monitor to startup when cache bytes don't change This fixes an issue where TM refused to start if a cache's bytes in astats never change. TM returns 503 until it's polled every cache twice, and has a BytesPerSecond calculation. The issue was, if a cache had hard-coded proc.net.dev which never changed, TM has no way to know the cache simply hasn't updated its stats, and that we won't get a different stat on the next poll. This gives TM a timeout, once it successfully polls a cache once, if the stats don't change in 60s, we start up anyway. It's not reasonable for a cache to update stats less frequently than a minute. While guessing isn't ideal, there's no good alternative. We can't know the cache simply hasn't updated its stats internally; the only option is a timeout, or never starting. Incidentally, this problem occurred in production. We had a bad cache lying to us with hard-coded astats that never updated, causing TM to never start. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rob05c/incubator-trafficcontrol tm-fixstartupwithbadastats Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-trafficcontrol/pull/772.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #772 commit 4fd0b6d4375f8a78254aff970c9bdb696bc56bc5 Author: Robert Butts Date: 2017-08-03T14:29:00Z Fix TM2 to startup when cache bytes don't change This fixes an issue where TM refused to start if a cache's bytes in astats never change. TM returns 503 until it's polled every cache twice, and has a BytesPerSecond calculation. The issue was, if a cache had hard-coded proc.net.dev which never changed, TM has no way to know the cache simply hasn't updated its stats, and that we won't get a different stat on the next poll. This gives TM a timeout, once it successfully polls a cache once, if the stats don't change in 60s, we start up anyway. It's not reasonable for a cache to update stats less frequently than a minute. While guessing isn't ideal, there's no good alternative. We can't know the cache simply hasn't updated its stats internally; the only option is a timeout, or never starting. Incidentally, this problem occurred in production. We had a bad cache lying to us with hard-coded astats that never updated, causing TM to never start. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---