[jira] [Comment Edited] (TS-3848) ATS runs without cache or partial cache on disk errors
[ https://issues.apache.org/jira/browse/TS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727894#comment-14727894 ] Alan M. Carroll edited comment on TS-3848 at 9/2/15 7:37 PM: - The more I look at Pushkar's patch, the more I think [~zwoop] is wrong. The question is, what should be the behavior for {{wait_for_cache==1}}? Leif wants this to be an error for an empty {{storage.config}} because in that case ATS will come up without any cache and the origin will be pounded. But how is that really different from a valid {{storage.config}} but the disks are not physically available? Won't this lead to the same issue? Which means ATS shouldn't come up in that case either, but then what is the difference between 1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean "don't run if no storage configured, but do run if there's no cache because the disks are dead"? And what if {{storage.config}} isn't empty but has a typo in the device path? Should that fail for {{1}} or only the empty {{storage.config}}? IMHO either you're willing to run without cache, or you're not. It doesn't seem relevant whether ATS is without cache because of a bad {{storage.config}} or hardware/operating system problems. Leif should just set {{wait_for_cache}} to {{2}} to say "must have some cache". 0 - run regardless of cache, accept inbound ASAP 1 - run regardless of cache, accept inbound after cache initialization 2 - run only if some cache, accept inbound after cache initialization 3 - run only if all configured cache, accept inbound after cache initialization was (Author: amc): The more I look at Pushkar's patch, the more I think [~zwoop] is wrong. The question is, what should be the behavior for {{wait_for_cache==1}}? Leif wants this to be an error for an empty {{storage.config}} because in that case ATS will come up without any cache and the origin will be pounded. But how is that really different from a valid {{storage.config}} but the disks are not physically available? Won't this lead to the same issue? Which means ATS shouldn't come up in that case either, but then what is the difference between 1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean "don't run if no storage configured, but do run if there's no cache because the disks are dead"? And what if {{storage.config}} isn't empty but has a typo in the device path? Should that fail for {{1}} or only the empty {{storage.config}}? IMHO either you're willing to run without cache, or you're not. It doesn't seem relevant whether ATS is without cache because of a bad {{storage.config}} or hardware/operating system problems. Leif should just set {{wait_for_cache}} to {{2}} to say "must have some cache". > ATS runs without cache or partial cache on disk errors > -- > > Key: TS-3848 > URL: https://issues.apache.org/jira/browse/TS-3848 > Project: Traffic Server > Issue Type: Bug > Components: Cache >Reporter: Pushkar Pradhan >Assignee: Alan M. Carroll > Fix For: 6.1.0 > > > Problem: > If ATS fails to initialize the cache (none of the disks were accessible), the > behavior depends on proxy.config.http.wait_for_cache: > If wait_for_cache = 0, it will listen for requests and serve the requests (by > fetching from origin/parent/peer). > If wait_for_cache = 1, it will never listen for requests. This is almost like > a hang. > We would like to change this so that we can take some action when the cache > fails to initialize (even partially): > Proposed Solution: > Define a new variable: proxy.config.http.cache.required > Value range: 0-2 > 0 (default) - Do nothing > 1 - Abort trafficserver if it failed to initialize all the disks/volumes > 2 - Abort trafficserver if it failed to initialize even one of the disks or > volumes. > Preconditions for this new behavior are: > proxy.config.http.cache.required = 1 (HTTP caching enabled) and > proxy.config.http.wait_for_cache = 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3848) ATS runs without cache or partial cache on disk errors
[ https://issues.apache.org/jira/browse/TS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727894#comment-14727894 ] Alan M. Carroll edited comment on TS-3848 at 9/2/15 7:35 PM: - The more I look at Pushkar's patch, the more I think [~zwoop] is wrong. The question is, what should be the behavior for {{wait_for_cache==1}}? Leif wants this to be an error for an empty {{storage.config}} because in that case ATS will come up without any cache and the origin will be pounded. But how is that really different from a valid {{storage.config}} but the disks are not physically available? Won't this lead to the same issue? Which means ATS shouldn't come up in that case either, but then what is the difference between 1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean "don't run if no storage configured, but do run if there's no cache because the disks are dead"? And what if {{storage.config}} isn't empty but has a typo in the device path? Should that fail for {{1}} or only the empty {{storage.config}}? IMHO either you're willing to run without cache, or you're not. It doesn't seem relevant whether ATS is without cache because of a bad {{storage.config}} or hardware/operating system problems. Leif should just set {{wait_for_cache}} to {{2}} to say "must have some cache". was (Author: amc): The more I look at Pushkar's patch, the more I think [~zwoop] is wrong. The question is, what should be the behavior for {{wait_for_cache==1}}? Leif wants this to be an error for an empty {{storage.config}} because in that case ATS will come up without any cache and the origin will be pounded. But how is that really different from a valid {{storage.config}} but the disks are not physically available? Won't this lead to the same issue? Which means ATS shouldn't come up in that case either, but then what is the difference between 1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean "don't run if no storage configured, but do run if there's no cache because the disks are dead"? And what if {{storage.config}} isn't empty but has a typo in the device path? Should that fail for {{1}} or only the empty {{storage.config}}? > ATS runs without cache or partial cache on disk errors > -- > > Key: TS-3848 > URL: https://issues.apache.org/jira/browse/TS-3848 > Project: Traffic Server > Issue Type: Bug > Components: Cache >Reporter: Pushkar Pradhan >Assignee: Alan M. Carroll > Fix For: 6.1.0 > > > Problem: > If ATS fails to initialize the cache (none of the disks were accessible), the > behavior depends on proxy.config.http.wait_for_cache: > If wait_for_cache = 0, it will listen for requests and serve the requests (by > fetching from origin/parent/peer). > If wait_for_cache = 1, it will never listen for requests. This is almost like > a hang. > We would like to change this so that we can take some action when the cache > fails to initialize (even partially): > Proposed Solution: > Define a new variable: proxy.config.http.cache.required > Value range: 0-2 > 0 (default) - Do nothing > 1 - Abort trafficserver if it failed to initialize all the disks/volumes > 2 - Abort trafficserver if it failed to initialize even one of the disks or > volumes. > Preconditions for this new behavior are: > proxy.config.http.cache.required = 1 (HTTP caching enabled) and > proxy.config.http.wait_for_cache = 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TS-3848) ATS runs without cache or partial cache on disk errors
[ https://issues.apache.org/jira/browse/TS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711954#comment-14711954 ] Leif Hedstrom edited comment on TS-3848 at 8/25/15 8:46 PM: I disagree. I think wait_for_cache=1 and storage.config being empty is an error case, and we should *at least* give serious Error()'s and/or Warning()'s on it. My preference would be to exit() with an error code. For example, we have had cases where a bad config push pushes an empty storage.config. It's much better (in our case) to honor the wait_for_cache=1 and not let it proxy (because those boxes *would* kill the origin). But, exiting with an error would be better. was (Author: zwoop): I disagree. I think wait_for_cache=1 and storage.config being empty is an error case, and we should *at least* give serious Error()'s and/or Warning()'s on it. My preference would be to exit() with an error code. For example, we have had cases where a bad config push pushes an empty storage.config. It'd be much better (in our case) to honor the wait_for_cache=1 and not let it proxy (because those boxes *would* kill the origin). ATS runs without cache or partial cache on disk errors -- Key: TS-3848 URL: https://issues.apache.org/jira/browse/TS-3848 Project: Traffic Server Issue Type: Bug Components: Cache Reporter: Pushkar Pradhan Assignee: Alan M. Carroll Fix For: 6.1.0 Problem: If ATS fails to initialize one or more disks it continues to run without cache. This can cause origin overload. The situation can be somewhat mitigated by setting proxy.config.http.wait_for_cache = 1 and if none of the disks failed to initialize. However, even if wait_for_cache = 1 and only one or a few disks failed to initialize, ATS will continue to serve traffic. Proposed Solution: Define a new variable: proxy.config.http.cache.required Value range: 0-2 0 (default) - Do nothing 1 - Abort trafficserver if it failed to initialize all the disks/volumes 2 - Abort trafficserver if it failed to initialize even one of the disks or volumes. If proxy.config.http.cache.required = 1 and proxy.config.http.wait_for_cache = 1 and if proxy.config.http.cache.required 0 then abort the traffic server if one or more cache disks/volumes could not be initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)