[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582917#comment-14582917 ] James Peach commented on TS-3104: - I applied the first patch, but I don't think the second is correct. The management socket is supposed to internally reconnect, so there should be no need to call TSInit more than once. There might be a bug in the management client code, however. > traffic_cop can't restart traffic_manager properly > -- > > Key: TS-3104 > URL: https://issues.apache.org/jira/browse/TS-3104 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Victor >Assignee: James Peach > Fix For: 6.0.0 > > Attachments: ts-0022-fix-lockfile-killgroup.patch, > ts-0023-cop-reinit-mgr-api-on-failure.patch > > > In some cases traffic_cop can't restart traffic_manager properly. We met > these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are > two places in code which in my opinion need corrections: > 1) The logic which decides whether to kill process or group. > 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of > failure and this fact leads to constant attempts to connect to manager using > socket id == -1. > I have prepared patches for both issues. Please kindly take a look at them > and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582913#comment-14582913 ] ASF subversion and git services commented on TS-3104: - Commit 3a9a489108368ceb7ee9c23a867303c481f753dd in trafficserver's branch refs/heads/master from [~jpe...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=3a9a489 ] Partially revert TS-3104 > traffic_cop can't restart traffic_manager properly > -- > > Key: TS-3104 > URL: https://issues.apache.org/jira/browse/TS-3104 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Victor >Assignee: James Peach > Fix For: 6.0.0 > > Attachments: ts-0022-fix-lockfile-killgroup.patch, > ts-0023-cop-reinit-mgr-api-on-failure.patch > > > In some cases traffic_cop can't restart traffic_manager properly. We met > these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are > two places in code which in my opinion need corrections: > 1) The logic which decides whether to kill process or group. > 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of > failure and this fact leads to constant attempts to connect to manager using > socket id == -1. > I have prepared patches for both issues. Please kindly take a look at them > and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582912#comment-14582912 ] ASF subversion and git services commented on TS-3104: - Commit ba0306c356ad4ec58c8ff77f120c61eaa229c6c9 in trafficserver's branch refs/heads/master from [~vleschuk] [ https://git-wip-us.apache.org/repos/asf?p=trafficserver.git;h=ba0306c ] TS-3104: fix lockfile logic which decides whether to kill process or group > traffic_cop can't restart traffic_manager properly > -- > > Key: TS-3104 > URL: https://issues.apache.org/jira/browse/TS-3104 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Victor >Assignee: James Peach > Fix For: 6.0.0 > > Attachments: ts-0022-fix-lockfile-killgroup.patch, > ts-0023-cop-reinit-mgr-api-on-failure.patch > > > In some cases traffic_cop can't restart traffic_manager properly. We met > these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are > two places in code which in my opinion need corrections: > 1) The logic which decides whether to kill process or group. > 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of > failure and this fact leads to constant attempts to connect to manager using > socket id == -1. > I have prepared patches for both issues. Please kindly take a look at them > and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372009#comment-14372009 ] Phil Sorber commented on TS-3104: - Moving out to 6.0.0. > traffic_cop can't restart traffic_manager properly > -- > > Key: TS-3104 > URL: https://issues.apache.org/jira/browse/TS-3104 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Victor >Assignee: James Peach > Fix For: 6.0.0 > > Attachments: ts-0022-fix-lockfile-killgroup.patch, > ts-0023-cop-reinit-mgr-api-on-failure.patch > > > In some cases traffic_cop can't restart traffic_manager properly. We met > these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are > two places in code which in my opinion need corrections: > 1) The logic which decides whether to kill process or group. > 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of > failure and this fact leads to constant attempts to connect to manager using > socket id == -1. > I have prepared patches for both issues. Please kindly take a look at them > and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361591#comment-14361591 ] Phil Sorber commented on TS-3104: - [~jpe...@apache.org], Is this patch committable? > traffic_cop can't restart traffic_manager properly > -- > > Key: TS-3104 > URL: https://issues.apache.org/jira/browse/TS-3104 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Victor >Assignee: James Peach > Fix For: 5.3.0 > > Attachments: ts-0022-fix-lockfile-killgroup.patch, > ts-0023-cop-reinit-mgr-api-on-failure.patch > > > In some cases traffic_cop can't restart traffic_manager properly. We met > these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are > two places in code which in my opinion need corrections: > 1) The logic which decides whether to kill process or group. > 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of > failure and this fact leads to constant attempts to connect to manager using > socket id == -1. > I have prepared patches for both issues. Please kindly take a look at them > and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TS-3104) traffic_cop can't restart traffic_manager properly
[ https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154736#comment-14154736 ] Victor commented on TS-3104: Whe the issue was reproduced one could see it in syslog (journalctl): numerous messages "unable to retrieve manager_binary". After applying the attached patches the issue was gone, the processes were restarted correctly by traffic_cop. The following tests were made: * kill `pgrep traffic_manager` * kill -9 `pgrep traffic_manager` * kill `pgrep traffic_server` * kill -9 `pgrep traffic_server` * kill `pgrep traffic_manager`; kill `pgrep traffic_server` * kill -9 `pgrep traffic_manager`; kill -9 `pgrep traffic_server` In all cases both manager and traffic_server were restarted correctly, no endless loop of traffic_cop trying to restart manager was seen. > traffic_cop can't restart traffic_manager properly > -- > > Key: TS-3104 > URL: https://issues.apache.org/jira/browse/TS-3104 > Project: Traffic Server > Issue Type: Bug > Components: Cop >Reporter: Victor > Attachments: ts-0022-fix-lockfile-killgroup.patch, > ts-0023-cop-reinit-mgr-api-on-failure.patch > > > In some cases traffic_cop can't restart traffic_manager properly. We met > these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are > two places in code which in my opinion need corrections: > 1) The logic which decides whether to kill process or group. > 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of > failure and this fact leads to constant attempts to connect to manager using > socket id == -1. > I have prepared patches for both issues. Please kindly take a look at them > and let me know your thoughts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)