[jira] [Updated] (TS-4801) Are we marking parents down too aggressively?

2016-08-31 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-4801:
--
Fix Version/s: 7.0.0

> Are we marking parents down too aggressively?
> -
>
> Key: TS-4801
> URL: https://issues.apache.org/jira/browse/TS-4801
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Parent Proxy
>Reporter: Leif Hedstrom
> Fix For: 7.0.0
>
>
> [~jrushford] and I were looking at some code, and found this piece which 
> looks suspicious:
> {code}
> } else if (s->current.attempts < s->txn_conf->parent_connect_attempts) {
>   s->current.attempts++;
>   // Are we done with this particular parent?
>   if ((s->current.attempts - 1) % 
> s->http_config_param->per_parent_connect_attempts != 0) {
> // No we are not done with this parent so retry
> s->next_action = how_to_open_connection(s);
> DebugTxn("http_trans", "%s Retrying parent for attempt %d, max %" 
> PRId64, "[handle_response_from_parent]",
>  s->current.attempts, 
> s->http_config_param->per_parent_connect_attempts);
> return;
>   } else {
> DebugTxn("http_trans", "%s %d per parent attempts exhausted", 
> "[handle_response_from_parent]", s->current.attempts);
> // Only mark the parent down if we failed to connect
> //  to the parent otherwise slow origin servers cause
> //  us to mark the parent down
> if (s->current.state == CONNECTION_ERROR) {
>   s->parent_params->markParentDown(&s->parent_result);
> }
> // We are done so look for another parent if any
> next_lookup = find_server_and_update_current_info(s);
>   }
> } else {
>   // Done trying parents... fail over to origin server if that is
>   //   appropriate
>   DebugTxn("http_trans", "[handle_response_from_parent] Error. No more 
> retries.");
>   s->parent_params->markParentDown(&s->parent_result);
>   s->parent_result.result = PARENT_FAIL;
>   next_lookup = find_server_and_update_current_info(s);
> }
> {code}
> Here's my thinking: It seems that if we exhaust parent_connect_attempts 
> (which is proxy.config.http.parent_proxy.total_connect_attempts, default 4), 
> we end up always marking whatever the current parent is as potentially down.
> This seems wrong, because it might do this even if the number of tries 
> against that particular parent has not been exhausted (that is the 
> proxy.config.http.parent_proxy.per_parent_connect_attempts setting, tested 
> inside the loop).
> Looking at this, it feels that we should either remove that markParentDown() 
> entirely, or at a minimum add the same checks as above, i.e.
> {code}
> -  s->parent_params->markParentDown(&s->parent_result);
> +  if (s->current.state == CONNECTION_ERROR) {
> +s->parent_params->markParentDown(&s->parent_result);
> +  }
> {code}
> I'm still doing some more tests, one concern is that it's unclear if the 
> earlier comment is correct, and that we can detect the difference between a 
> connection failure to parent, vs an origin server response to the parent is 
> just being slow (resulting in a timeout and no response to child).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4801) Are we marking parents down too aggressively?

2016-09-02 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-4801:
--
Backport to Version: 6.2.1

> Are we marking parents down too aggressively?
> -
>
> Key: TS-4801
> URL: https://issues.apache.org/jira/browse/TS-4801
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Parent Proxy
>Reporter: Leif Hedstrom
>Assignee: Leif Hedstrom
> Fix For: 7.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [~jrushford] and I were looking at some code, and found this piece which 
> looks suspicious:
> {code}
> } else if (s->current.attempts < s->txn_conf->parent_connect_attempts) {
>   s->current.attempts++;
>   // Are we done with this particular parent?
>   if ((s->current.attempts - 1) % 
> s->http_config_param->per_parent_connect_attempts != 0) {
> // No we are not done with this parent so retry
> s->next_action = how_to_open_connection(s);
> DebugTxn("http_trans", "%s Retrying parent for attempt %d, max %" 
> PRId64, "[handle_response_from_parent]",
>  s->current.attempts, 
> s->http_config_param->per_parent_connect_attempts);
> return;
>   } else {
> DebugTxn("http_trans", "%s %d per parent attempts exhausted", 
> "[handle_response_from_parent]", s->current.attempts);
> // Only mark the parent down if we failed to connect
> //  to the parent otherwise slow origin servers cause
> //  us to mark the parent down
> if (s->current.state == CONNECTION_ERROR) {
>   s->parent_params->markParentDown(&s->parent_result);
> }
> // We are done so look for another parent if any
> next_lookup = find_server_and_update_current_info(s);
>   }
> } else {
>   // Done trying parents... fail over to origin server if that is
>   //   appropriate
>   DebugTxn("http_trans", "[handle_response_from_parent] Error. No more 
> retries.");
>   s->parent_params->markParentDown(&s->parent_result);
>   s->parent_result.result = PARENT_FAIL;
>   next_lookup = find_server_and_update_current_info(s);
> }
> {code}
> Here's my thinking: It seems that if we exhaust parent_connect_attempts 
> (which is proxy.config.http.parent_proxy.total_connect_attempts, default 4), 
> we end up always marking whatever the current parent is as potentially down.
> This seems wrong, because it might do this even if the number of tries 
> against that particular parent has not been exhausted (that is the 
> proxy.config.http.parent_proxy.per_parent_connect_attempts setting, tested 
> inside the loop).
> Looking at this, it feels that we should either remove that markParentDown() 
> entirely, or at a minimum add the same checks as above, i.e.
> {code}
> -  s->parent_params->markParentDown(&s->parent_result);
> +  if (s->current.state == CONNECTION_ERROR) {
> +s->parent_params->markParentDown(&s->parent_result);
> +  }
> {code}
> I'm still doing some more tests, one concern is that it's unclear if the 
> earlier comment is correct, and that we can detect the difference between a 
> connection failure to parent, vs an origin server response to the parent is 
> just being slow (resulting in a timeout and no response to child).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4801) Are we marking parents down too aggressively?

2016-11-09 Thread Phil Sorber (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Sorber updated TS-4801:

Backport to Version:   (was: 6.2.1)

> Are we marking parents down too aggressively?
> -
>
> Key: TS-4801
> URL: https://issues.apache.org/jira/browse/TS-4801
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Parent Proxy
>Reporter: Leif Hedstrom
>Assignee: Leif Hedstrom
> Fix For: 6.2.1, 7.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [~jrushford] and I were looking at some code, and found this piece which 
> looks suspicious:
> {code}
> } else if (s->current.attempts < s->txn_conf->parent_connect_attempts) {
>   s->current.attempts++;
>   // Are we done with this particular parent?
>   if ((s->current.attempts - 1) % 
> s->http_config_param->per_parent_connect_attempts != 0) {
> // No we are not done with this parent so retry
> s->next_action = how_to_open_connection(s);
> DebugTxn("http_trans", "%s Retrying parent for attempt %d, max %" 
> PRId64, "[handle_response_from_parent]",
>  s->current.attempts, 
> s->http_config_param->per_parent_connect_attempts);
> return;
>   } else {
> DebugTxn("http_trans", "%s %d per parent attempts exhausted", 
> "[handle_response_from_parent]", s->current.attempts);
> // Only mark the parent down if we failed to connect
> //  to the parent otherwise slow origin servers cause
> //  us to mark the parent down
> if (s->current.state == CONNECTION_ERROR) {
>   s->parent_params->markParentDown(&s->parent_result);
> }
> // We are done so look for another parent if any
> next_lookup = find_server_and_update_current_info(s);
>   }
> } else {
>   // Done trying parents... fail over to origin server if that is
>   //   appropriate
>   DebugTxn("http_trans", "[handle_response_from_parent] Error. No more 
> retries.");
>   s->parent_params->markParentDown(&s->parent_result);
>   s->parent_result.result = PARENT_FAIL;
>   next_lookup = find_server_and_update_current_info(s);
> }
> {code}
> Here's my thinking: It seems that if we exhaust parent_connect_attempts 
> (which is proxy.config.http.parent_proxy.total_connect_attempts, default 4), 
> we end up always marking whatever the current parent is as potentially down.
> This seems wrong, because it might do this even if the number of tries 
> against that particular parent has not been exhausted (that is the 
> proxy.config.http.parent_proxy.per_parent_connect_attempts setting, tested 
> inside the loop).
> Looking at this, it feels that we should either remove that markParentDown() 
> entirely, or at a minimum add the same checks as above, i.e.
> {code}
> -  s->parent_params->markParentDown(&s->parent_result);
> +  if (s->current.state == CONNECTION_ERROR) {
> +s->parent_params->markParentDown(&s->parent_result);
> +  }
> {code}
> I'm still doing some more tests, one concern is that it's unclear if the 
> earlier comment is correct, and that we can detect the difference between a 
> connection failure to parent, vs an origin server response to the parent is 
> just being slow (resulting in a timeout and no response to child).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-4801) Are we marking parents down too aggressively?

2016-11-09 Thread Phil Sorber (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Sorber updated TS-4801:

Fix Version/s: 6.2.1

> Are we marking parents down too aggressively?
> -
>
> Key: TS-4801
> URL: https://issues.apache.org/jira/browse/TS-4801
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Parent Proxy
>Reporter: Leif Hedstrom
>Assignee: Leif Hedstrom
> Fix For: 6.2.1, 7.0.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [~jrushford] and I were looking at some code, and found this piece which 
> looks suspicious:
> {code}
> } else if (s->current.attempts < s->txn_conf->parent_connect_attempts) {
>   s->current.attempts++;
>   // Are we done with this particular parent?
>   if ((s->current.attempts - 1) % 
> s->http_config_param->per_parent_connect_attempts != 0) {
> // No we are not done with this parent so retry
> s->next_action = how_to_open_connection(s);
> DebugTxn("http_trans", "%s Retrying parent for attempt %d, max %" 
> PRId64, "[handle_response_from_parent]",
>  s->current.attempts, 
> s->http_config_param->per_parent_connect_attempts);
> return;
>   } else {
> DebugTxn("http_trans", "%s %d per parent attempts exhausted", 
> "[handle_response_from_parent]", s->current.attempts);
> // Only mark the parent down if we failed to connect
> //  to the parent otherwise slow origin servers cause
> //  us to mark the parent down
> if (s->current.state == CONNECTION_ERROR) {
>   s->parent_params->markParentDown(&s->parent_result);
> }
> // We are done so look for another parent if any
> next_lookup = find_server_and_update_current_info(s);
>   }
> } else {
>   // Done trying parents... fail over to origin server if that is
>   //   appropriate
>   DebugTxn("http_trans", "[handle_response_from_parent] Error. No more 
> retries.");
>   s->parent_params->markParentDown(&s->parent_result);
>   s->parent_result.result = PARENT_FAIL;
>   next_lookup = find_server_and_update_current_info(s);
> }
> {code}
> Here's my thinking: It seems that if we exhaust parent_connect_attempts 
> (which is proxy.config.http.parent_proxy.total_connect_attempts, default 4), 
> we end up always marking whatever the current parent is as potentially down.
> This seems wrong, because it might do this even if the number of tries 
> against that particular parent has not been exhausted (that is the 
> proxy.config.http.parent_proxy.per_parent_connect_attempts setting, tested 
> inside the loop).
> Looking at this, it feels that we should either remove that markParentDown() 
> entirely, or at a minimum add the same checks as above, i.e.
> {code}
> -  s->parent_params->markParentDown(&s->parent_result);
> +  if (s->current.state == CONNECTION_ERROR) {
> +s->parent_params->markParentDown(&s->parent_result);
> +  }
> {code}
> I'm still doing some more tests, one concern is that it's unclear if the 
> earlier comment is correct, and that we can detect the difference between a 
> connection failure to parent, vs an origin server response to the parent is 
> just being slow (resulting in a timeout and no response to child).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)