[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
Hi xavier, while it is true that the PIDFile directive is not in those nwer releases as outlined by yutani before in comment #6 the issue itself does not show up, see comment #10 I ran the test in Bionic again today and can confirm that it still does not trigger. So I have to assume it is fixed via something else (newer systemd?) in there. If you have a case that can reproduce this please outline the steps to do so and we can consider porting the fix to 17.10/18.04 as well. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: Opinion Status in keepalived source package in Xenial: Fix Released Status in systemd source package in Xenial: Opinion Status in keepalived package in Debian: Fix Released Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
Affects aswell 17.10 and 18.04. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: Opinion Status in keepalived source package in Xenial: Fix Released Status in systemd source package in Xenial: Opinion Status in keepalived package in Debian: Fix Released Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
** Changed in: keepalived (Debian) Status: New => Fix Released -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: Opinion Status in keepalived source package in Xenial: Fix Released Status in systemd source package in Xenial: Opinion Status in keepalived package in Debian: Fix Released Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
> Anything else needed on src:systemd side of things? Hi xnox, Well, in a perfect world and with a time machine to help you could look what in zesty's systemd fixed it to work reliably even with the suboptimal service file. But then you have more important tasks all around you and it is fixed where it was broken. Until then I think "opinion" is more appropriate than "incomplete" - yet at whatever state it is ok to just let it hang around for now. If similar issues come up for other services we might reconsider. ** Changed in: systemd (Ubuntu) Status: Incomplete => Opinion ** Changed in: systemd (Ubuntu Xenial) Status: Incomplete => Opinion -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: Opinion Status in keepalived source package in Xenial: Fix Released Status in systemd source package in Xenial: Opinion Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf:
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
Anything else needed on src:systemd side of things? ** Changed in: systemd (Ubuntu) Status: New => Incomplete ** Changed in: systemd (Ubuntu Xenial) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: Incomplete Status in keepalived source package in Xenial: Fix Released Status in systemd source package in Xenial: Incomplete Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
This bug was fixed in the package keepalived - 1:1.2.19-1ubuntu0.2 --- keepalived (1:1.2.19-1ubuntu0.2) xenial; urgency=medium * Add PIDFile to avoid misdetection of MainPID on restart (LP: #1644530). -- Christian EhrhardtMon, 13 Mar 2017 13:23:47 +0100 ** Changed in: keepalived (Ubuntu Xenial) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Fix Released Status in systemd source package in Xenial: New Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script {
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
** Changed in: systemd (Ubuntu) Assignee: Dimitri John Ledkov (xnox) => (unassigned) ** Changed in: systemd (Ubuntu) Milestone: ubuntu-17.03 => None -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Fix Committed Status in systemd source package in Xenial: New Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
Tested on Xenial pre-Proposed: 15 hits with-Proposed: 0 hits Also for general regression things seem to work normal. Setting verification-done. Yutani - if you could also verify Proposed that would make it even better! ** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Fix Committed Status in systemd source package in Xenial: New Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist:
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
Hello Yutani, or anyone else affected, Accepted keepalived into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/keepalived/1:1.2.19-1ubuntu0.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: keepalived (Ubuntu Xenial) Status: Triaged => Fix Committed ** Tags added: verification-needed -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Fix Committed Status in systemd source package in Xenial: New Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean:
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
** Changed in: keepalived (Debian) Status: Unknown => New -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Triaged Status in systemd source package in Xenial: New Status in keepalived package in Debian: New Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
I tested yakkety to check where an SRU makes sense and can confirm that there with systemd 231-9ubuntu3 it already is slow on the loop restarts and working without the issue this bug is about. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Triaged Status in systemd source package in Xenial: New Status in keepalived package in Debian: Unknown Bug description: [Impact] * Restarts of keepalived can leave stale processes with the old configuration around. * The systemd detection of the MainPID is suboptimal, and combined with not waiting on signals being handled it can fail on second restart killing the (still) remaining process of the first start. * Upstream has a PIDFile statement, this has proven to avoid the issue in the MainPID guessing code of systemd. [Test Case] * Set up keepalived, the more complex the config is the "bigger" is the reace window, below in the description is a trivial sample config that works well. * As a test run the loop restarting the service head-to-head while staying under the max-restart limit $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expectation: no output other than timing Without fix: sometimes MainPIDs do no more exist, in these cases the child processes are the "old" ones from last execution with the old config. [Regression Potential] * Low because * A PIDFile statement is recommended by systemd for type=forking services anyway. * Upstream keepalived has this statement in their service file * By the kind of change, it should have no functional impact to other parts of the service other than for the PID detection of the job by Systemd. * Yet regression potential is never zero. There might be the unlikely case, which were considered working before due to a new config not properly being picked up. After the fix they will behave correctly and might show up as false-positives then if e.g. config was bad. [Other Info] * Usually a fix has to be in at least the latest Development release before SRUing it. But as I outlined below in later Releases than Xenial systemd seems to have improved making this change not-required. We haven't identified the bits for this (there is a bug task here), and they might as well be very complex. I think it is correct to fix Xenial in this regard with the simple change to the service file for now. * To eventually match I created a Debian bug task to ask them for the inclusion of the PIDFile so it can slowly tickle back down to newer Ubuntu Releases - also there more often people run backports where the issue might occur on older systemd versions (just as it does for us on Xenial) --- Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
Reported the request to pick up the PIDFile statement to Debian. Linking up the debbug here. ** Bug watch added: Debian Bug tracker #857618 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857618 ** Also affects: keepalived (Debian) via http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857618 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Triaged Status in systemd source package in Xenial: New Status in keepalived package in Debian: Unknown Bug description: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expected: no error, only time reports Error case: Showing Main PID exited, details below Step by Step Procedures --- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result -- 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (code=exited, status=0/SUCCESS) Tasks: 3 Memory: 1.7M CPU: 11ms CGroup: /system.slice/keepalived.service ├─4783 /usr/sbin/keepalived ├─4784
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
While I'd like to understand what systemd changed, I still think adding the PIDFile is correct as a fix for keepalived. Understanding systemd might just open up to fix more than just this service. Never the less for type forking (that is the case here) PIDFile is recommended (https://www.freedesktop.org/software/systemd/man/systemd.service.html). Various cases are listed when the MainPID guessing might fail otherwise. Also it is known proven to fix the issue, as well as being the upstream systemd file. I'll next week schedule a fix and as well file one with Debian to not keep that Delta forever. ** Changed in: keepalived (Ubuntu Xenial) Status: Confirmed => Triaged ** Changed in: keepalived (Ubuntu Xenial) Assignee: (unassigned) => ChristianEhrhardt (paelzer) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Triaged Status in systemd source package in Xenial: New Bug description: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expected: no error, only time reports Error case: Showing Main PID exited, details below Step by Step Procedures --- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result -- 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
@Yutani - would you mind reporting on the two diffs to upstream to Debian and mention the bug number here? - After=syslog.target which might be reasonable to pick - PIDFile=/var/run/keepalived.pid (which is not bad and we thought would If you are unwilling or unable let me know, but it would be a great help if you could do so. For SRU later this might get interesting. Usually an SUR requires the fix to be in the latest Release - but it is "fixed" in Zesty, yet not with the code that we might SRU back then (PIDFile). If possible I'm waiting for the systemd info to show up here to get the bigger picture. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Confirmed Status in systemd source package in Xenial: New Bug description: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expected: no error, only time reports Error case: Showing Main PID exited, details below Step by Step Procedures --- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result -- 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
I can confirm that the adding of a PIDFile as suggested makes it survive the looped test. We should try to understand the changes behind it working in zesty. But sooner or later adding the PIDFile might be the less invasive option to make it working in an SRU. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Confirmed Status in systemd source package in Xenial: New Bug description: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expected: no error, only time reports Error case: Showing Main PID exited, details below Step by Step Procedures --- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result -- 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (code=exited, status=0/SUCCESS) Tasks: 3 Memory: 1.7M CPU: 11ms CGroup: /system.slice/keepalived.service ├─4783 /usr/sbin/keepalived ├─4784 /usr/sbin/keepalived └─4785 /usr/sbin/keepalived 2) Second restart Now Main PID is 4783 and subprocesses' PIDs are
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
** Changed in: systemd (Ubuntu) Milestone: None => ubuntu-17.03 ** Changed in: systemd (Ubuntu) Assignee: (unassigned) => Dimitri John Ledkov (xnox) -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1644530 Title: keepalived fails to restart cleanly due to the wrong systemd settings Status in keepalived package in Ubuntu: Fix Released Status in systemd package in Ubuntu: New Status in keepalived source package in Xenial: Confirmed Status in systemd source package in Xenial: New Bug description: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } Loop based probing for the Error to exist: -- After the setup above start keepalived on both servers: $ sudo systemctl start keepalived.service Then run the following loop $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done Expected: no error, only time reports Error case: Showing Main PID exited, details below Step by Step Procedures --- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result -- 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:51:45 UTC; 1s ago Process: 4782 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (code=exited, status=0/SUCCESS) Tasks: 3 Memory: 1.7M CPU: 11ms CGroup: /system.slice/keepalived.service ├─4783 /usr/sbin/keepalived ├─4784 /usr/sbin/keepalived └─4785 /usr/sbin/keepalived 2) Second restart Now Main PID is 4783 and subprocesses' PIDs are 4783-4785. This is problematic as 4783 is the old process, which should have exited before new processes
[Touch-packages] [Bug 1644530] Re: keepalived fails to restart cleanly due to the wrong systemd settings
What I think is happening in our case: Since no ExecStop= was specified, systemd will send SIGTERM [...] Details: https://www.freedesktop.org/software/systemd/man/systemd.kill.html# KillMode is "process" in the service file. That means "If set to process, only the main process itself is killed." So in this case it relies on that being forwarded to the child processes. That takes time. If not waiting for it to be "complete" the following restart will send the next SIGTERM and this eliminates the (already in cleanup) main proccess before it can distribute the TERM to its childs/siblings. This is our error state. In this broken state Main PID: 10600 (code=exited, status=0/SUCCESS) Our mode of KillMode=process might have special handling and kill all of them (since there is no main to kill). That is the cleanup, which gets it back to work again. Since the service files in both (X/Z) cases are the same I wonder if there is a systemd change which fixes this by some sort of waiting for the signal to be handled (e.g. waiting for the MainPid to go away on its own). Systemd versions: Xenial: 229-4ubuntu16 Zesty: 232-18ubuntu1 ** Description changed: Because "PIDFile=" directive is missing in the systemd unit file, keepalived sometimes fails to kill all old processes. The old processes remain with old settings and cause unexpected behaviors. The detail of this bug is described in this ticket in upstream: https://github.com/acassen/keepalived/issues/443. The official systemd unit file is available since version 1.2.24 by this commit: https://github.com/acassen/keepalived/commit/635ab69afb44cd8573663e62f292c6bb84b44f15 This includes "PIDFile" directive correctly: PIDFile=/var/run/keepalived.pid We should go the same way. I am using Ubuntu 16.04.1, kernel 4.4.0-45-generic. Package: keepalived Version: 1.2.19-1 === How to reproduce: I used the two instances of Ubuntu 16.04.2 on DigitalOcean: Configurations -- MASTER server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state BACKUP priority 100 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } BACKUP server's /etc/keepalived/keepalived.conf: vrrp_script chk_nothing { script "/bin/true" interval 2 } vrrp_instance G1 { interface eth1 state MASTER priority 200 virtual_router_id 123 unicast_src_ip unicast_peer { } track_script { chk_nothing } } - Procedures - -- + Loop based probing for the Error to exist: + -- + After the setup above start keepalived on both servers: + $ sudo systemctl start keepalived.service + Then run the following loop + $ for j in $(seq 1 20); do sleep 11s; time for i in $(seq 1 5); do sudo systemctl restart keepalived; sudo systemctl status keepalived | egrep 'Main.*exited'; done; done + + Expected: no error, only time reports + Error case: Showing Main PID exited, details below + + Step by Step Procedures + --- 1) Start keepalived on both servers $ sudo systemctl start keepalived.service 2) Restart keepalived on either one $ sudo systemctl restart keepalived.service 3) Check status and PID $ systemctl status -n0 keepalived.service Result -- 0) Before restart Main PID is 3402 and the subprocesses' PIDs are 3403-3406. So far so good. root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2017-03-04 01:37:12 UTC; 14min ago Process: 3402 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS) Main PID: 3403 (keepalived) Tasks: 3 Memory: 1.7M CPU: 1.900s CGroup: /system.slice/keepalived.service ├─3403 /usr/sbin/keepalived ├─3405 /usr/sbin/keepalived └─3406 /usr/sbin/keepalived 1) First restart Now Main PID is 3403, which was one of the previous subprocesses and is actually exited. Something is wrong. Yet, the previous processes are all exited; we are not likely to see no weird behaviors here. root@ubuntu-2gb-sgp1-01:~# systemctl restart keepalived root@ubuntu-2gb-sgp1-01:~# systemctl status -n0 keepalived ● keepalived.service - Keepalive Daemon (LVS and VRRP) Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled) Active: