[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582912#comment-14582912 ] ASF subversion and git services commented on TS-3104: - Commit ba0306c356ad4ec58c8ff77f120c61eaa229c6c9 in trafficserver's branch refs/heads/master from [~vleschuk] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=ba0306c ] TS-3104: fix lockfile logic which decides whether to kill process or group traffic_cop can't restart traffic_manager properly -- Key: TS-3104 URL: https://issues.apache.org/jira/browse/TS-3104 Project: Traffic Server Issue Type: Bug Components: Cop Reporter: Victor Assignee: James Peach Fix For: 6.0.0 Attachments: ts-0022-fix-lockfile-killgroup.patch, ts-0023-cop-reinit-mgr-api-on-failure.patch In some cases traffic_cop can't restart traffic_manager properly. We met these issues at Ashmanov and partners (http://en.ashmanov.com/). There are two places in code which in my opinion need corrections: 1) The logic which decides whether to kill process or group. 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of failure and this fact leads to constant attempts to connect to manager using socket id == -1. I have prepared patches for both issues. Please kindly take a look at them and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582913#comment-14582913 ] ASF subversion and git services commented on TS-3104: - Commit 3a9a489108368ceb7ee9c23a867303c481f753dd in trafficserver's branch refs/heads/master from [~jpe...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=3a9a489 ] Partially revert TS-3104 traffic_cop can't restart traffic_manager properly -- Key: TS-3104 URL: https://issues.apache.org/jira/browse/TS-3104 Project: Traffic Server Issue Type: Bug Components: Cop Reporter: Victor Assignee: James Peach Fix For: 6.0.0 Attachments: ts-0022-fix-lockfile-killgroup.patch, ts-0023-cop-reinit-mgr-api-on-failure.patch In some cases traffic_cop can't restart traffic_manager properly. We met these issues at Ashmanov and partners (http://en.ashmanov.com/). There are two places in code which in my opinion need corrections: 1) The logic which decides whether to kill process or group. 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of failure and this fact leads to constant attempts to connect to manager using socket id == -1. I have prepared patches for both issues. Please kindly take a look at them and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582917#comment-14582917 ] James Peach commented on TS-3104: - I applied the first patch, but I don't think the second is correct. The management socket is supposed to internally reconnect, so there should be no need to call TSInit more than once. There might be a bug in the management client code, however. traffic_cop can't restart traffic_manager properly -- Key: TS-3104 URL: https://issues.apache.org/jira/browse/TS-3104 Project: Traffic Server Issue Type: Bug Components: Cop Reporter: Victor Assignee: James Peach Fix For: 6.0.0 Attachments: ts-0022-fix-lockfile-killgroup.patch, ts-0023-cop-reinit-mgr-api-on-failure.patch In some cases traffic_cop can't restart traffic_manager properly. We met these issues at Ashmanov and partners (http://en.ashmanov.com/). There are two places in code which in my opinion need corrections: 1) The logic which decides whether to kill process or group. 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of failure and this fact leads to constant attempts to connect to manager using socket id == -1. I have prepared patches for both issues. Please kindly take a look at them and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372009#comment-14372009 ] Phil Sorber commented on TS-3104: - Moving out to 6.0.0. traffic_cop can't restart traffic_manager properly -- Key: TS-3104 URL: https://issues.apache.org/jira/browse/TS-3104 Project: Traffic Server Issue Type: Bug Components: Cop Reporter: Victor Assignee: James Peach Fix For: 6.0.0 Attachments: ts-0022-fix-lockfile-killgroup.patch, ts-0023-cop-reinit-mgr-api-on-failure.patch In some cases traffic_cop can't restart traffic_manager properly. We met these issues at Ashmanov and partners (http://en.ashmanov.com/). There are two places in code which in my opinion need corrections: 1) The logic which decides whether to kill process or group. 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of failure and this fact leads to constant attempts to connect to manager using socket id == -1. I have prepared patches for both issues. Please kindly take a look at them and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361591#comment-14361591 ] Phil Sorber commented on TS-3104: - [~jpe...@apache.org], Is this patch committable? traffic_cop can't restart traffic_manager properly -- Key: TS-3104 URL: https://issues.apache.org/jira/browse/TS-3104 Project: Traffic Server Issue Type: Bug Components: Cop Reporter: Victor Assignee: James Peach Fix For: 5.3.0 Attachments: ts-0022-fix-lockfile-killgroup.patch, ts-0023-cop-reinit-mgr-api-on-failure.patch In some cases traffic_cop can't restart traffic_manager properly. We met these issues at Ashmanov and partners (http://en.ashmanov.com/). There are two places in code which in my opinion need corrections: 1) The logic which decides whether to kill process or group. 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of failure and this fact leads to constant attempts to connect to manager using socket id == -1. I have prepared patches for both issues. Please kindly take a look at them and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154736#comment-14154736 ] Victor commented on TS-3104: Whe the issue was reproduced one could see it in syslog (journalctl): numerous messages unable to retrieve manager_binary. After applying the attached patches the issue was gone, the processes were restarted correctly by traffic_cop. The following tests were made: * kill `pgrep traffic_manager` * kill -9 `pgrep traffic_manager` * kill `pgrep traffic_server` * kill -9 `pgrep traffic_server` * kill `pgrep traffic_manager`; kill `pgrep traffic_server` * kill -9 `pgrep traffic_manager`; kill -9 `pgrep traffic_server` In all cases both manager and traffic_server were restarted correctly, no endless loop of traffic_cop trying to restart manager was seen. traffic_cop can't restart traffic_manager properly -- Key: TS-3104 URL: https://issues.apache.org/jira/browse/TS-3104 Project: Traffic Server Issue Type: Bug Components: Cop Reporter: Victor Attachments: ts-0022-fix-lockfile-killgroup.patch, ts-0023-cop-reinit-mgr-api-on-failure.patch In some cases traffic_cop can't restart traffic_manager properly. We met these issues at Ashmanov and partners (http://en.ashmanov.com/). There are two places in code which in my opinion need corrections: 1) The logic which decides whether to kill process or group. 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of failure and this fact leads to constant attempts to connect to manager using socket id == -1. I have prepared patches for both issues. Please kindly take a look at them and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)