[jira] [Comment Edited] (TS-3848) ATS runs without cache or partial cache on disk errors

2015-09-02 Thread Alan M. Carroll (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727894#comment-14727894
 ] 

Alan M. Carroll edited comment on TS-3848 at 9/2/15 7:37 PM:
-

The more I look at Pushkar's patch, the more I think [~zwoop] is wrong.

The question is, what should be the behavior for {{wait_for_cache==1}}? Leif 
wants this to be an error for an empty {{storage.config}} because in that case 
ATS will come up without any cache and the origin will be pounded. But how is 
that really different from a valid {{storage.config}} but the disks are not 
physically available? Won't this lead to the same issue? Which means ATS 
shouldn't come up in that case either, but then what is the difference between 
1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean 
"don't run if no storage configured, but do run if there's no cache because the 
disks are dead"? And what if {{storage.config}} isn't empty but has a typo in 
the device path? Should that fail for {{1}} or only the empty 
{{storage.config}}?

IMHO either you're willing to run without cache, or you're not. It doesn't seem 
relevant whether ATS is without cache because of a bad {{storage.config}} or 
hardware/operating system problems. Leif should just set {{wait_for_cache}} to 
{{2}} to say "must have some cache".

0 - run regardless of cache, accept inbound ASAP
1 - run regardless of cache, accept inbound after cache initialization
2 - run only if some cache, accept inbound after cache initialization
3 - run only if all configured cache, accept inbound after cache initialization



was (Author: amc):
The more I look at Pushkar's patch, the more I think [~zwoop] is wrong.

The question is, what should be the behavior for {{wait_for_cache==1}}? Leif 
wants this to be an error for an empty {{storage.config}} because in that case 
ATS will come up without any cache and the origin will be pounded. But how is 
that really different from a valid {{storage.config}} but the disks are not 
physically available? Won't this lead to the same issue? Which means ATS 
shouldn't come up in that case either, but then what is the difference between 
1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean 
"don't run if no storage configured, but do run if there's no cache because the 
disks are dead"? And what if {{storage.config}} isn't empty but has a typo in 
the device path? Should that fail for {{1}} or only the empty 
{{storage.config}}?

IMHO either you're willing to run without cache, or you're not. It doesn't seem 
relevant whether ATS is without cache because of a bad {{storage.config}} or 
hardware/operating system problems. Leif should just set {{wait_for_cache}} to 
{{2}} to say "must have some cache".

> ATS runs without cache or partial cache on disk errors
> --
>
> Key: TS-3848
> URL: https://issues.apache.org/jira/browse/TS-3848
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Pushkar Pradhan
>Assignee: Alan M. Carroll
> Fix For: 6.1.0
>
>
> Problem:
> If ATS fails to initialize the cache (none of the disks were accessible), the 
> behavior depends on proxy.config.http.wait_for_cache:
> If wait_for_cache = 0, it will listen for requests and serve the requests (by 
> fetching from origin/parent/peer). 
> If wait_for_cache = 1, it will never listen for requests. This is almost like 
> a hang.
> We would like to change this so that we can take some action when the cache 
> fails to initialize (even partially):
> Proposed Solution:
> Define a new variable: proxy.config.http.cache.required
> Value range: 0-2
> 0 (default) - Do nothing
> 1 - Abort trafficserver if it failed to initialize all the disks/volumes
> 2 - Abort trafficserver if it failed to initialize even one of the disks or 
> volumes.
> Preconditions for this new behavior are:
> proxy.config.http.cache.required = 1 (HTTP caching enabled) and 
> proxy.config.http.wait_for_cache = 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3848) ATS runs without cache or partial cache on disk errors

2015-09-02 Thread Alan M. Carroll (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727894#comment-14727894
 ] 

Alan M. Carroll edited comment on TS-3848 at 9/2/15 7:35 PM:
-

The more I look at Pushkar's patch, the more I think [~zwoop] is wrong.

The question is, what should be the behavior for {{wait_for_cache==1}}? Leif 
wants this to be an error for an empty {{storage.config}} because in that case 
ATS will come up without any cache and the origin will be pounded. But how is 
that really different from a valid {{storage.config}} but the disks are not 
physically available? Won't this lead to the same issue? Which means ATS 
shouldn't come up in that case either, but then what is the difference between 
1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean 
"don't run if no storage configured, but do run if there's no cache because the 
disks are dead"? And what if {{storage.config}} isn't empty but has a typo in 
the device path? Should that fail for {{1}} or only the empty 
{{storage.config}}?

IMHO either you're willing to run without cache, or you're not. It doesn't seem 
relevant whether ATS is without cache because of a bad {{storage.config}} or 
hardware/operating system problems. Leif should just set {{wait_for_cache}} to 
{{2}} to say "must have some cache".


was (Author: amc):
The more I look at Pushkar's patch, the more I think [~zwoop] is wrong.

The question is, what should be the behavior for {{wait_for_cache==1}}? Leif 
wants this to be an error for an empty {{storage.config}} because in that case 
ATS will come up without any cache and the origin will be pounded. But how is 
that really different from a valid {{storage.config}} but the disks are not 
physically available? Won't this lead to the same issue? Which means ATS 
shouldn't come up in that case either, but then what is the difference between 
1 and 2 for {{wait_for_cache}}? Do we really want the value {{1}} to mean 
"don't run if no storage configured, but do run if there's no cache because the 
disks are dead"? And what if {{storage.config}} isn't empty but has a typo in 
the device path? Should that fail for {{1}} or only the empty 
{{storage.config}}?

> ATS runs without cache or partial cache on disk errors
> --
>
> Key: TS-3848
> URL: https://issues.apache.org/jira/browse/TS-3848
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cache
>Reporter: Pushkar Pradhan
>Assignee: Alan M. Carroll
> Fix For: 6.1.0
>
>
> Problem:
> If ATS fails to initialize the cache (none of the disks were accessible), the 
> behavior depends on proxy.config.http.wait_for_cache:
> If wait_for_cache = 0, it will listen for requests and serve the requests (by 
> fetching from origin/parent/peer). 
> If wait_for_cache = 1, it will never listen for requests. This is almost like 
> a hang.
> We would like to change this so that we can take some action when the cache 
> fails to initialize (even partially):
> Proposed Solution:
> Define a new variable: proxy.config.http.cache.required
> Value range: 0-2
> 0 (default) - Do nothing
> 1 - Abort trafficserver if it failed to initialize all the disks/volumes
> 2 - Abort trafficserver if it failed to initialize even one of the disks or 
> volumes.
> Preconditions for this new behavior are:
> proxy.config.http.cache.required = 1 (HTTP caching enabled) and 
> proxy.config.http.wait_for_cache = 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TS-3848) ATS runs without cache or partial cache on disk errors

2015-08-25 Thread Leif Hedstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/TS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711954#comment-14711954
 ] 

Leif Hedstrom edited comment on TS-3848 at 8/25/15 8:46 PM:


I disagree. I think wait_for_cache=1 and storage.config being empty is an error 
case, and we should *at least* give serious Error()'s and/or Warning()'s on it. 
My preference would be to exit() with an error code.

For example, we have had cases where a bad config push pushes an empty 
storage.config. It's much better (in our case) to honor the wait_for_cache=1 
and not let it proxy (because those boxes *would* kill the origin). But, 
exiting with an error would be better.


was (Author: zwoop):
I disagree. I think wait_for_cache=1 and storage.config being empty is an error 
case, and we should *at least* give serious Error()'s and/or Warning()'s on it. 
My preference would be to exit() with an error code.

For example, we have had cases where a bad config push pushes an empty 
storage.config. It'd be much better (in our case) to honor the wait_for_cache=1 
and not let it proxy (because those boxes *would* kill the origin).

 ATS runs without cache or partial cache on disk errors
 --

 Key: TS-3848
 URL: https://issues.apache.org/jira/browse/TS-3848
 Project: Traffic Server
  Issue Type: Bug
  Components: Cache
Reporter: Pushkar Pradhan
Assignee: Alan M. Carroll
 Fix For: 6.1.0


 Problem:
 If ATS fails to initialize one or more disks it continues to run without 
 cache. This can cause origin overload.
 The situation can be somewhat mitigated by setting 
 proxy.config.http.wait_for_cache = 1 and if none of the disks failed to 
 initialize.
 However, even if wait_for_cache = 1 and only one or a few disks failed to 
 initialize, ATS will continue to serve traffic. 
 Proposed Solution:
 Define a new variable: proxy.config.http.cache.required
 Value range: 0-2
 0 (default) - Do nothing
 1 - Abort trafficserver if it failed to initialize all the disks/volumes
 2 - Abort trafficserver if it failed to initialize even one of the disks or 
 volumes.
 If proxy.config.http.cache.required = 1 and proxy.config.http.wait_for_cache 
 = 1 and if proxy.config.http.cache.required  0 then abort the traffic server 
 if one or more cache disks/volumes could not be initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)