Re: HAProxy and systemd compatibility
Hi Marc-Antoine, On Fri, Feb 08, 2013 at 03:58:45PM +0100, Marc-Antoine Perennou wrote: Hi, Currently, to reload haproxy configuration, you have to use -sf. Systemd philosophy is for the daemon not to fork by themselves, but rather let the init process do it for them. My first patch adds a new option -Ds which is exactly like -D, but instead of forking n times to get n jobs running and then exiting, prefers to wait for all the children it just created. With this done, haproxy becomes more systemd-compliant, without changing anything for other systems. Now comes a problem with the -sf way to do things. First of all, in the systemd world, reload commands should be oneshot ones, which means they should not be the new main process but rather a tool which makes a call to it and then exits. With the current approach, the reload command is the new main command and moreover, it makes the previous one exit. Systemd only tracks the main program, seeing it ending, it assumes it either finished or failed, and kills everything remaining as a grabage collector. We then end up with no haproxy running at all. My second patch is a wrapper around haproxy, no changes at all have been made into it, so it's not intrusive and doesn't change anything for other hosts. What this wrapper does is basically launching haproxy as a child, listen to the SIGUSR2 (not to conflict with haproxy itself) signal, and spawing a new haproxy with -sf as a child to relay the first one. The third patch is a logical continuation to the two first ones, providing a systemd service file. Thanks for the details. It's nice to have a cover mail to explain what the patches do, however next time, please don't forget that this first mail is lost when committing patches, and that the relevant parts should be present in their respective patches' commit messages. I'm replying to your other mails individually to make tracking easier. Willy
Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility
On Fri, Feb 08, 2013 at 03:58:46PM +0100, Marc-Antoine Perennou wrote: @@ -1493,8 +1499,13 @@ int main(int argc, char **argv) px = px-next; } - if (proc == global.nbproc) + if (proc == global.nbproc) { + if (global.mode MODE_SYSTEMD) { + for (proc = 0; proc global.nbproc; proc++) + waitpid(children[proc], NULL, 0); + } exit(0); /* parent must leave */ + } /* if we're NOT in QUIET mode, we should now close the 3 first FDs to ensure * that we can detach from the TTY. We MUST NOT do it in other cases since I think we have an issue here, which should not be too hard to fix. The parent is the one which binds to all the ports, so here it still owns them. This means that a soft restart of a new process will fail until the parent leaves, since it will not release these ports. And the parent will not leave until all of its children have terminated. Or maybe I'm missing something :-/ Also wouldn't it be worth catching EINTR here in the loop ? Otherwise I fear that random signals sent to the process will make the for loop progress to next child eventhough previous ones are not dead. This can easily happen during debugging sessions (ctrl-Z/bg) or when people mistakenly send a SIGHUP to the old process. Willy
Re: [PATCH 2/3] MEDIUM: add haproxy-systemd-wrapper
On Fri, Feb 08, 2013 at 03:58:47PM +0100, Marc-Antoine Perennou wrote: +static void usage(const char *progname) +{ + fprintf(stderr, Usage: %s [-f cfgfile] [-p pidfile]\n, progname); Here I think it would be worth supporting most of the other command line arguments. For instance, -L is mandatory when session replication is enabled between multiple peers. -C is used by people who have to load many ACLs or certs and want to chdir there to load the files. -c is handy to validate configurations so that they don't have to switch between two different tools and syntaxes. I think we can ignore the various debugging flags now however. Regards, Willy
Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility
On 9 February 2013 09:45, Willy Tarreau w...@1wt.eu wrote: On Fri, Feb 08, 2013 at 03:58:46PM +0100, Marc-Antoine Perennou wrote: @@ -1493,8 +1499,13 @@ int main(int argc, char **argv) px = px-next; } - if (proc == global.nbproc) + if (proc == global.nbproc) { + if (global.mode MODE_SYSTEMD) { + for (proc = 0; proc global.nbproc; proc++) + waitpid(children[proc], NULL, 0); + } exit(0); /* parent must leave */ + } /* if we're NOT in QUIET mode, we should now close the 3 first FDs to ensure * that we can detach from the TTY. We MUST NOT do it in other cases since I think we have an issue here, which should not be too hard to fix. The parent is the one which binds to all the ports, so here it still owns them. This means that a soft restart of a new process will fail until the parent leaves, since it will not release these ports. And the parent will not leave until all of its children have terminated. Or maybe I'm missing something :-/ Also wouldn't it be worth catching EINTR here in the loop ? Otherwise I fear that random signals sent to the process will make the for loop progress to next child eventhough previous ones are not dead. This can easily happen during debugging sessions (ctrl-Z/bg) or when people mistakenly send a SIGHUP to the old process. Willy I just made a simple test, running a webserver serving a big file locally, using haproxy, my wrapper and systemd service. I started a download and during this download, reloaded haproxy. I using nbproc = 2. What happened ? When I started haproxy, I ended up with a wrapper launching a child launching itself two children, we'll call them ha1, ha11 and ha12. Then when I reloaded, the wrapper launched a new child which launched two new children, ha2, ha21 and ha22. ha11 and thus ha1 were still there until the download had finished, ha12 got to zombie state. ha2, ha21 and ha22 successfully have shown up and take all new connections. Once the download has finished, ha11 exited, ha12 too (waitpid making it leave the zombie state) and then ha11, leaving us with only ha2, ha21 and ha22. I think this is the expected behaviour, so there don't seem to be any bug here. For the EINTR stuff, I'm not sure at all, not really familiar with it, so I will give it a look
Re: [PATCH 2/3] MEDIUM: add haproxy-systemd-wrapper
I'll do even simpler, when submitting V2 of this patchset with the EINTR issue, I'll basically forard every arg given to the wrapper to all the children themselves. The tool is not really meant to be used by the user himself though, but rather to be launched by the systemd service. Is SIGUSR2 ok here ? I first did it with SIGUSR1 but then children couldn't bind to this signal on reload, since it was already a USR1 action, so I took the first one not colliding. On 9 February 2013 09:49, Willy Tarreau w...@1wt.eu wrote: On Fri, Feb 08, 2013 at 03:58:47PM +0100, Marc-Antoine Perennou wrote: +static void usage(const char *progname) +{ + fprintf(stderr, Usage: %s [-f cfgfile] [-p pidfile]\n, progname); Here I think it would be worth supporting most of the other command line arguments. For instance, -L is mandatory when session replication is enabled between multiple peers. -C is used by people who have to load many ACLs or certs and want to chdir there to load the files. -c is handy to validate configurations so that they don't have to switch between two different tools and syntaxes. I think we can ignore the various debugging flags now however. Regards, Willy
Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility
Hi, On Sat, Feb 09, 2013 at 10:44:04AM +0100, Marc-Antoine Perennou wrote: I just made a simple test, running a webserver serving a big file locally, using haproxy, my wrapper and systemd service. I started a download and during this download, reloaded haproxy. I using nbproc = 2. What happened ? When I started haproxy, I ended up with a wrapper launching a child launching itself two children, we'll call them ha1, ha11 and ha12. Then when I reloaded, the wrapper launched a new child which launched two new children, ha2, ha21 and ha22. ha11 and thus ha1 were still there until the download had finished, ha12 got to zombie state. Then if ha1 was still there, I don't understand how it did not prevent the new process from binding to the port. Could you please check ha1 is still bound to the port once the process runs ? That's what I don't understand. ha2, ha21 and ha22 successfully have shown up and take all new connections. Once the download has finished, ha11 exited, ha12 too (waitpid making it leave the zombie state) and then ha11, leaving us with only ha2, ha21 and ha22. I think this is the expected behaviour, so there don't seem to be any bug here. Yes it's the expected behaviour, but I don't understand *why* it works, so it is very possible that we're having a bug somewhere else making this work as a side effect. For the EINTR stuff, I'm not sure at all, not really familiar with it, so I will give it a look Typically I would replace : waitpid(pid, NULL, 0); with while (waitpid(pid, NULL, 0) == -1 errno == EINTR); For instance, when your process receives SIGTTOU/SIGTTIN upon a failed attempt of a new process to start, the old one very likely skips a few children (one per signal). If it can help you, here's how to test for the worst case : - have a running process with a simple configuration bound to one port : listen foo bind :8000 - then have a second configuration which will not work due to a double bind on the same port, and another bind on the first port : listen foo bind :8000 listen fail1 bind :8001 listen fail2 bind :8001 - when the first one is running, try to start the second one. It will fail to bind to :8000, will send a SIGTTOU to process 1 and try again. Then it will fail to bind :8001 without knowing it's not because of #1, so it will wait a bit, believing it's #1 which has not yet released it, and then it will abort, sending SIGTTIN to #1 to inform it that it gives up and that #1 must continue its job as if nothing happened. - process 1 should then just remain unaffected. And restarting #1 with the fail2 listener commented out should work as expected. Best regards, Willy
Re: [PATCH 2/3] MEDIUM: add haproxy-systemd-wrapper
On Sat, Feb 09, 2013 at 10:47:23AM +0100, Marc-Antoine Perennou wrote: I'll do even simpler, when submitting V2 of this patchset with the EINTR issue, I'll basically forard every arg given to the wrapper to all the children themselves. The tool is not really meant to be used by the user himself though, but rather to be launched by the systemd service. I understand but it's quite common to expect to reuse the same command line everywhere. Is SIGUSR2 ok here ? I first did it with SIGUSR1 but then children couldn't bind to this signal on reload, since it was already a USR1 action, so I took the first one not colliding. 6 signals are used by haproxy : - SIGTTOU is used to make the old process temporarily release the ports it is bound to so that a new process may attempt a startup - SIGTTIN is used to tell the old process to rebind the ports that were unbound by SIGTTOU and continue its job as if nothing happened - SIGUSR1 is used to make the old process unbind and wait for the last connection to complete before exiting - SIGTERM is used to make the old process go away without waiting for connections to complete - SIGHUP is used to dump server states in the logs - SIGQUIT is used to dump memory pool usages to the logs Additionally, I'd rather keep SIGUSR2 available for later use, because the master-worker rework that Simon Horman did makes use of this signal to restart the process and reopen config files. However I think that your usage of SIGUSR2 is exactly the same as for the master-worker mode so it seems to make a lot of sense to use it in the wrapper. Willy
Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility
On 9 February 2013 11:06, Willy Tarreau w...@1wt.eu wrote: Hi, On Sat, Feb 09, 2013 at 10:44:04AM +0100, Marc-Antoine Perennou wrote: I just made a simple test, running a webserver serving a big file locally, using haproxy, my wrapper and systemd service. I started a download and during this download, reloaded haproxy. I using nbproc = 2. What happened ? When I started haproxy, I ended up with a wrapper launching a child launching itself two children, we'll call them ha1, ha11 and ha12. Then when I reloaded, the wrapper launched a new child which launched two new children, ha2, ha21 and ha22. ha11 and thus ha1 were still there until the download had finished, ha12 got to zombie state. Then if ha1 was still there, I don't understand how it did not prevent the new process from binding to the port. Could you please check ha1 is still bound to the port once the process runs ? That's what I don't understand. ha2, ha21 and ha22 successfully have shown up and take all new connections. Once the download has finished, ha11 exited, ha12 too (waitpid making it leave the zombie state) and then ha11, leaving us with only ha2, ha21 and ha22. I think this is the expected behaviour, so there don't seem to be any bug here. Yes it's the expected behaviour, but I don't understand *why* it works, so it is very possible that we're having a bug somewhere else making this work as a side effect. For the EINTR stuff, I'm not sure at all, not really familiar with it, so I will give it a look Typically I would replace : waitpid(pid, NULL, 0); with while (waitpid(pid, NULL, 0) == -1 errno == EINTR); For instance, when your process receives SIGTTOU/SIGTTIN upon a failed attempt of a new process to start, the old one very likely skips a few children (one per signal). If it can help you, here's how to test for the worst case : - have a running process with a simple configuration bound to one port : listen foo bind :8000 - then have a second configuration which will not work due to a double bind on the same port, and another bind on the first port : listen foo bind :8000 listen fail1 bind :8001 listen fail2 bind :8001 - when the first one is running, try to start the second one. It will fail to bind to :8000, will send a SIGTTOU to process 1 and try again. Then it will fail to bind :8001 without knowing it's not because of #1, so it will wait a bit, believing it's #1 which has not yet released it, and then it will abort, sending SIGTTIN to #1 to inform it that it gives up and that #1 must continue its job as if nothing happened. - process 1 should then just remain unaffected. And restarting #1 with the fail2 listener commented out should work as expected. Best regards, Willy Maybe this can help you understand why it works: Before the download: haproxy 6090 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6092 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6093 root4u IPv4 53819 0t0 TCP *:http (LISTEN) During the download: haproxy 6090 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6092 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6093 root1u IPv4 57972 0t0 TCP Lou.local:http-Lou.local:40395 (ESTABLISHED) haproxy 6093 root4u IPv4 53819 0t0 TCP *:http (LISTEN) During the download, after the reload: haproxy 6093 root1u IPv4 57972 0t0 TCP Lou.local:http-Lou.local:40395 (ESTABLISHED) haproxy 6239 root4u IPv4 54622 0t0 TCP *:http (LISTEN) haproxy 6240 root4u IPv4 54622 0t0 TCP *:http (LISTEN) haproxy 6241 root4u IPv4 54622 0t0 TCP *:http (LISTEN) After the download: haproxy 6239 root4u IPv4 54622 0t0 TCP *:http (LISTEN) haproxy 6240 root4u IPv4 54622 0t0 TCP *:http (LISTEN) haproxy 6241 root4u IPv4 54622 0t0 TCP *:http (LISTEN)
Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility
On Sat, Feb 09, 2013 at 11:45:57AM +0100, Marc-Antoine Perennou wrote: Maybe this can help you understand why it works: Before the download: haproxy 6090 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6092 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6093 root4u IPv4 53819 0t0 TCP *:http (LISTEN) During the download: haproxy 6090 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6092 root4u IPv4 53819 0t0 TCP *:http (LISTEN) haproxy 6093 root1u IPv4 57972 0t0 TCP Lou.local:http-Lou.local:40395 (ESTABLISHED) haproxy 6093 root4u IPv4 53819 0t0 TCP *:http (LISTEN) During the download, after the reload: haproxy 6093 root1u IPv4 57972 0t0 TCP Lou.local:http-Lou.local:40395 (ESTABLISHED) haproxy 6239 root4u IPv4 54622 0t0 TCP *:http (LISTEN) haproxy 6240 root4u IPv4 54622 0t0 TCP *:http (LISTEN) haproxy 6241 root4u IPv4 54622 0t0 TCP *:http (LISTEN) OK so the old parent process got the signal and left the place. I don't know why, but I thought that the pidfile only contained the children pids, but here it's clear that the parent got the signal as well. That's fine then. Thanks, Willy
Re: SSL handshake failure
Can I help in any way? On Fri, Feb 8, 2013 at 11:29 PM, Willy Tarreau w...@1wt.eu wrote: On Fri, Feb 08, 2013 at 11:27:31PM +0400, Samat Galimov wrote: You are right. After I start haproxy with patch it establishes first connection normally, as it should. All following connections are randomly dropping an error as before. OK, so maybe there are still some places where the change is needed. We'll audit the code again to try to find something else. Thanks, Willy
Re: SSL handshake failure
On Sat, Feb 09, 2013 at 08:45:50PM +0400, Samat Galimov wrote: Can I help in any way? Not yes I think, I'll check with Emeric if we can find other locations where errors might remain uncaught. Thanks! Willy