[jira] [Updated] (TS-3104) traffic_cop can't restart traffic_manager properly

2015-03-20 Thread Phil Sorber (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Sorber updated TS-3104:

Fix Version/s: (was: 5.3.0)
   6.0.0

> traffic_cop can't restart traffic_manager properly
> --
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Victor
>Assignee: James Peach
> Fix For: 6.0.0
>
> Attachments: ts-0022-fix-lockfile-killgroup.patch, 
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met 
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are 
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
> failure and this fact leads to constant attempts to connect to manager using 
> socket id == -1. 
> I have prepared patches for both issues. Please kindly take a look at them 
> and let me know your thoughts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3104) traffic_cop can't restart traffic_manager properly

2014-11-20 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3104:
---
Assignee: James Peach

> traffic_cop can't restart traffic_manager properly
> --
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Victor
>Assignee: James Peach
> Fix For: 5.3.0
>
> Attachments: ts-0022-fix-lockfile-killgroup.patch, 
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met 
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are 
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
> failure and this fact leads to constant attempts to connect to manager using 
> socket id == -1. 
> I have prepared patches for both issues. Please kindly take a look at them 
> and let me know your thoughts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3104) traffic_cop can't restart traffic_manager properly

2014-11-20 Thread Susan Hinrichs (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Susan Hinrichs updated TS-3104:
---
Fix Version/s: (was: 5.2.0)
   5.3.0

> traffic_cop can't restart traffic_manager properly
> --
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Victor
> Fix For: 5.3.0
>
> Attachments: ts-0022-fix-lockfile-killgroup.patch, 
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met 
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are 
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
> failure and this fact leads to constant attempts to connect to manager using 
> socket id == -1. 
> I have prepared patches for both issues. Please kindly take a look at them 
> and let me know your thoughts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3104) traffic_cop can't restart traffic_manager properly

2014-10-01 Thread Leif Hedstrom (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leif Hedstrom updated TS-3104:
--
Fix Version/s: 5.2.0

> traffic_cop can't restart traffic_manager properly
> --
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Victor
> Fix For: 5.2.0
>
> Attachments: ts-0022-fix-lockfile-killgroup.patch, 
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met 
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are 
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
> failure and this fact leads to constant attempts to connect to manager using 
> socket id == -1. 
> I have prepared patches for both issues. Please kindly take a look at them 
> and let me know your thoughts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TS-3104) traffic_cop can't restart traffic_manager properly

2014-10-01 Thread Victor (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor updated TS-3104:
---
Description: 
In some cases traffic_cop can't restart traffic_manager properly. We met these 
issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are two 
places in code which in my opinion need corrections:

1) The logic which decides whether to kill process or group.

2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
failure and this fact leads to constant attempts to connect to manager using 
socket id == -1. 

I have prepared patches for both issues. Please kindly take a look at them and 
let me know your thoughts.


  was:
In some cases traffic_cop can't restart traffic_manager properly. We met these 
issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are two 
places in code which in my opinion need corrections:

1) The logic which decides whether to kill process or group.

2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
failure and this fact leads to constant attempts to connect to manager using 
socket id == -1. 

I have prepared patches for both issues. Please kindly take a look at them and 
let me know your thoughts.


diff --git lib/ts/lockfile.cc lib/ts/lockfile.cc
index f6e9587..dbd7394 100644
--- lib/ts/lockfile.cc
+++ lib/ts/lockfile.cc
@@ -241,6 +241,7 @@ Lockfile::KillGroup(int sig, int initial_sig, const char 
*pname)
   int err;
   pid_t pid;
   pid_t holding_pid;
+  pid_t self = getpid();
 
   err = Open(&holding_pid);
   if (err == 1) // success getting the lock file
@@ -252,12 +253,20 @@ Lockfile::KillGroup(int sig, int initial_sig, const char 
*pname)
   pid = getpgid(holding_pid);
 } while ((pid < 0) && (errno == EINTR));
 
-if ((pid < 0) || (pid == getpid()))
+if ((pid < 0) || (pid == self)) {
+  // Error getting process group,
+  // or we are the group's owner.
+  // Let's kill just holding_pid
   pid = holding_pid;
-
-if (pid != 0) {
+}
+else if (pid != self) {
+  // We managed to get holding_pid's process group
+  // and this group is not ours.
   // This way, we kill the process_group:
   pid = -pid;
+}
+
+if (pid != 0) {
   // In order to get core files from each process, please
   // set your core_pattern appropriately.
   lockfile_kill_internal(holding_pid, initial_sig, pid, pname, sig);




 diff --git cop/TrafficCop.cc cop/TrafficCop.cc
index 307270e..56bc6d2 100644
--- cop/TrafficCop.cc
+++ cop/TrafficCop.cc
@@ -59,6 +59,7 @@ static const char COP_TRACE_FILE[] = "/tmp/traffic_cop.trace";
 
 #define COP_FATALLOG_ALERT
 #define COP_WARNING  LOG_ERR
+#define COP_INFO LOG_INFO
 #define COP_DEBUGLOG_DEBUG
 
 Diags * g_diags; // link time dependency
@@ -131,6 +132,9 @@ static int child_pid = 0;
 static int child_status = 0;
 static int sem_id = 11452;
 
+// manager API is initialized
+static bool mgmt_init = false;
+
 AppVersionInfo appVersionInfo;
 
 static char const localhost[] = "127.0.0.1";
@@ -1142,6 +1146,7 @@ test_mgmt_cli_port()
 
   if (TSRecordGetString("proxy.config.manager_binary", &val) !=  TS_ERR_OKAY) {
 cop_log(COP_WARNING, "(cli test) unable to retrieve manager_binary\n");
+mgmt_init = false; 
 ret = -1;
   } else {
 if (strcmp(val, manager_binary) != 0) {
@@ -1544,7 +1549,6 @@ check_no_run()
 static void*
 check(void *arg)
 {
-  bool mgmt_init = false;
   cop_log_trace("Entering check()\n");
 
   for (;;) {
@@ -1593,6 +1597,7 @@ check(void *arg)
 
 // We do this after the first round of checks, since the first "check" 
will spawn traffic_manager
 if (!mgmt_init) {
+  cop_log(COP_INFO, "Initializing manager API\n");
   TSInit(Layout::get()->runtimedir, 
static_cast(TS_MGMT_OPT_NO_EVENTS | TS_MGMT_OPT_NO_SOCK_TESTS));
   mgmt_init = true;
 }



> traffic_cop can't restart traffic_manager properly
> --
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Victor
> Attachments: ts-0022-fix-lockfile-killgroup.patch, 
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met 
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are 
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
> failure and this fact leads to constant attempts to connect to manager using 
> socket id == -1. 
> I have prepared patches for both issues. Please kindly take a look at them 
> and let me know your thoughts.



--
This message w

[jira] [Updated] (TS-3104) traffic_cop can't restart traffic_manager properly

2014-10-01 Thread Victor (JIRA)

 [ 
https://issues.apache.org/jira/browse/TS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victor updated TS-3104:
---
Attachment: ts-0023-cop-reinit-mgr-api-on-failure.patch
ts-0022-fix-lockfile-killgroup.patch

Patches for described issues.

> traffic_cop can't restart traffic_manager properly
> --
>
> Key: TS-3104
> URL: https://issues.apache.org/jira/browse/TS-3104
> Project: Traffic Server
>  Issue Type: Bug
>  Components: Cop
>Reporter: Victor
> Attachments: ts-0022-fix-lockfile-killgroup.patch, 
> ts-0023-cop-reinit-mgr-api-on-failure.patch
>
>
> In some cases traffic_cop can't restart traffic_manager properly. We met 
> these issues at "Ashmanov and partners" (http://en.ashmanov.com/). There are 
> two places in code which in my opinion need corrections:
> 1) The logic which decides whether to kill process or group.
> 2) The main traffic_cop loop: it doesn't reinitialize manager API in case of 
> failure and this fact leads to constant attempts to connect to manager using 
> socket id == -1. 
> I have prepared patches for both issues. Please kindly take a look at them 
> and let me know your thoughts.
> diff --git lib/ts/lockfile.cc lib/ts/lockfile.cc
> index f6e9587..dbd7394 100644
> --- lib/ts/lockfile.cc
> +++ lib/ts/lockfile.cc
> @@ -241,6 +241,7 @@ Lockfile::KillGroup(int sig, int initial_sig, const char 
> *pname)
>int err;
>pid_t pid;
>pid_t holding_pid;
> +  pid_t self = getpid();
>  
>err = Open(&holding_pid);
>if (err == 1) // success getting the lock file
> @@ -252,12 +253,20 @@ Lockfile::KillGroup(int sig, int initial_sig, const 
> char *pname)
>pid = getpgid(holding_pid);
>  } while ((pid < 0) && (errno == EINTR));
>  
> -if ((pid < 0) || (pid == getpid()))
> +if ((pid < 0) || (pid == self)) {
> +  // Error getting process group,
> +  // or we are the group's owner.
> +  // Let's kill just holding_pid
>pid = holding_pid;
> -
> -if (pid != 0) {
> +}
> +else if (pid != self) {
> +  // We managed to get holding_pid's process group
> +  // and this group is not ours.
>// This way, we kill the process_group:
>pid = -pid;
> +}
> +
> +if (pid != 0) {
>// In order to get core files from each process, please
>// set your core_pattern appropriately.
>lockfile_kill_internal(holding_pid, initial_sig, pid, pname, sig);
>  diff --git cop/TrafficCop.cc cop/TrafficCop.cc
> index 307270e..56bc6d2 100644
> --- cop/TrafficCop.cc
> +++ cop/TrafficCop.cc
> @@ -59,6 +59,7 @@ static const char COP_TRACE_FILE[] = 
> "/tmp/traffic_cop.trace";
>  
>  #define COP_FATALLOG_ALERT
>  #define COP_WARNING  LOG_ERR
> +#define COP_INFO LOG_INFO
>  #define COP_DEBUGLOG_DEBUG
>  
>  Diags * g_diags; // link time dependency
> @@ -131,6 +132,9 @@ static int child_pid = 0;
>  static int child_status = 0;
>  static int sem_id = 11452;
>  
> +// manager API is initialized
> +static bool mgmt_init = false;
> +
>  AppVersionInfo appVersionInfo;
>  
>  static char const localhost[] = "127.0.0.1";
> @@ -1142,6 +1146,7 @@ test_mgmt_cli_port()
>  
>if (TSRecordGetString("proxy.config.manager_binary", &val) !=  
> TS_ERR_OKAY) {
>  cop_log(COP_WARNING, "(cli test) unable to retrieve manager_binary\n");
> +mgmt_init = false; 
>  ret = -1;
>} else {
>  if (strcmp(val, manager_binary) != 0) {
> @@ -1544,7 +1549,6 @@ check_no_run()
>  static void*
>  check(void *arg)
>  {
> -  bool mgmt_init = false;
>cop_log_trace("Entering check()\n");
>  
>for (;;) {
> @@ -1593,6 +1597,7 @@ check(void *arg)
>  
>  // We do this after the first round of checks, since the first "check" 
> will spawn traffic_manager
>  if (!mgmt_init) {
> +  cop_log(COP_INFO, "Initializing manager API\n");
>TSInit(Layout::get()->runtimedir, 
> static_cast(TS_MGMT_OPT_NO_EVENTS | 
> TS_MGMT_OPT_NO_SOCK_TESTS));
>mgmt_init = true;
>  }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)