Re: HAProxy and systemd compatibility

2013-02-09 Thread Willy Tarreau
Hi Marc-Antoine,

On Fri, Feb 08, 2013 at 03:58:45PM +0100, Marc-Antoine Perennou wrote:
 Hi,
 
 Currently, to reload haproxy configuration, you have to use -sf.
 
 Systemd philosophy is for the daemon not to fork by themselves, but rather let
 the init process do it for them.
 
 My first patch adds a new option -Ds which is exactly like -D, but 
 instead of
 forking n times to get n jobs running and then exiting, prefers to wait for 
 all the
 children it just created. With this done, haproxy becomes more 
 systemd-compliant,
 without changing anything for other systems.
 
 Now comes a problem with the -sf way to do things. First of all, in the 
 systemd world,
 reload commands should be oneshot ones, which means they should not be the 
 new main
 process but rather a tool which makes a call to it and then exits. With the 
 current approach,
 the reload command is the new main command and moreover, it makes the 
 previous one exit.
 Systemd only tracks the main program, seeing it ending, it assumes it either 
 finished or failed,
 and kills everything remaining as a grabage collector. We then end up with no 
 haproxy running 
 at all.
 
 My second patch is a wrapper around haproxy, no changes at all have been made 
 into it,
 so it's not intrusive and doesn't change anything for other hosts. What this 
 wrapper does
 is basically launching haproxy as a child, listen to the SIGUSR2 (not to 
 conflict with
 haproxy itself) signal, and spawing a new haproxy with -sf as a child to 
 relay the
 first one.
 
 The third patch is a logical continuation to the two first ones, providing a 
 systemd service file.

Thanks for the details. It's nice to have a cover mail to explain what
the patches do, however next time, please don't forget that this first
mail is lost when committing patches, and that the relevant parts should
be present in their respective patches' commit messages.

I'm replying to your other mails individually to make tracking easier.

Willy




Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility

2013-02-09 Thread Willy Tarreau
On Fri, Feb 08, 2013 at 03:58:46PM +0100, Marc-Antoine Perennou wrote:
 @@ -1493,8 +1499,13 @@ int main(int argc, char **argv)
   px = px-next;
   }
  
 - if (proc == global.nbproc)
 + if (proc == global.nbproc) {
 + if (global.mode  MODE_SYSTEMD) {
 + for (proc = 0; proc  global.nbproc; proc++)
 + waitpid(children[proc], NULL, 0);
 + }
   exit(0); /* parent must leave */
 + }
  
   /* if we're NOT in QUIET mode, we should now close the 3 first 
 FDs to ensure
* that we can detach from the TTY. We MUST NOT do it in other 
 cases since

I think we have an issue here, which should not be too hard to fix. The
parent is the one which binds to all the ports, so here it still owns
them. This means that a soft restart of a new process will fail until
the parent leaves, since it will not release these ports. And the parent
will not leave until all of its children have terminated. Or maybe I'm
missing something :-/

Also wouldn't it be worth catching EINTR here in the loop ? Otherwise
I fear that random signals sent to the process will make the for loop
progress to next child eventhough previous ones are not dead. This can
easily happen during debugging sessions (ctrl-Z/bg) or when people
mistakenly send a SIGHUP to the old process.

Willy




Re: [PATCH 2/3] MEDIUM: add haproxy-systemd-wrapper

2013-02-09 Thread Willy Tarreau
On Fri, Feb 08, 2013 at 03:58:47PM +0100, Marc-Antoine Perennou wrote:
 +static void usage(const char *progname)
 +{
 + fprintf(stderr, Usage: %s [-f cfgfile] [-p pidfile]\n, progname);

Here I think it would be worth supporting most of the other command line
arguments. For instance, -L is mandatory when session replication is enabled
between multiple peers. -C is used by people who have to load many ACLs or
certs and want to chdir there to load the files. -c is handy to validate
configurations so that they don't have to switch between two different tools
and syntaxes. I think we can ignore the various debugging flags now however.

Regards,
Willy




Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility

2013-02-09 Thread Marc-Antoine Perennou
On 9 February 2013 09:45, Willy Tarreau w...@1wt.eu wrote:

 On Fri, Feb 08, 2013 at 03:58:46PM +0100, Marc-Antoine Perennou wrote:
  @@ -1493,8 +1499,13 @@ int main(int argc, char **argv)
px = px-next;
}
 
  - if (proc == global.nbproc)
  + if (proc == global.nbproc) {
  + if (global.mode  MODE_SYSTEMD) {
  + for (proc = 0; proc  global.nbproc;
 proc++)
  + waitpid(children[proc], NULL, 0);
  + }
exit(0); /* parent must leave */
  + }
 
/* if we're NOT in QUIET mode, we should now close the 3
 first FDs to ensure
 * that we can detach from the TTY. We MUST NOT do it in
 other cases since

 I think we have an issue here, which should not be too hard to fix. The
 parent is the one which binds to all the ports, so here it still owns
 them. This means that a soft restart of a new process will fail until
 the parent leaves, since it will not release these ports. And the parent
 will not leave until all of its children have terminated. Or maybe I'm
 missing something :-/

 Also wouldn't it be worth catching EINTR here in the loop ? Otherwise
 I fear that random signals sent to the process will make the for loop
 progress to next child eventhough previous ones are not dead. This can
 easily happen during debugging sessions (ctrl-Z/bg) or when people
 mistakenly send a SIGHUP to the old process.

 Willy



I just made a simple test, running a webserver serving a big file locally,
using haproxy,
my wrapper and systemd service. I started a download and during this
download,
reloaded haproxy. I using nbproc = 2.
What happened ?
When I started haproxy, I ended up with a wrapper launching a child
launching itself
two children, we'll call them ha1, ha11 and ha12. Then when I reloaded, the
wrapper
launched a new child which launched two new children, ha2, ha21 and ha22.
ha11 and thus ha1 were still there until the download had finished, ha12
got to zombie state.
ha2, ha21 and ha22 successfully have shown up and take all new connections.
Once the download has finished, ha11 exited, ha12 too (waitpid making it
leave the zombie state)
and then ha11, leaving us with only ha2, ha21 and ha22.
I think this is the expected behaviour, so there don't seem to be any bug
here.

For the EINTR stuff, I'm not sure at all, not really familiar with it, so I
will give it a look


Re: [PATCH 2/3] MEDIUM: add haproxy-systemd-wrapper

2013-02-09 Thread Marc-Antoine Perennou
I'll do even simpler, when submitting V2 of this patchset with the EINTR
issue,
I'll basically forard every arg given to the wrapper to all the children
themselves.
The tool is not really meant to be used by the user himself though, but
rather to
be launched by the systemd service.
Is SIGUSR2 ok here ? I first did it with SIGUSR1 but then children couldn't
bind
to this signal on reload, since it was already a USR1 action, so I took the
first one
not colliding.


On 9 February 2013 09:49, Willy Tarreau w...@1wt.eu wrote:

 On Fri, Feb 08, 2013 at 03:58:47PM +0100, Marc-Antoine Perennou wrote:
  +static void usage(const char *progname)
  +{
  + fprintf(stderr, Usage: %s [-f cfgfile] [-p pidfile]\n, progname);

 Here I think it would be worth supporting most of the other command line
 arguments. For instance, -L is mandatory when session replication is
 enabled
 between multiple peers. -C is used by people who have to load many ACLs or
 certs and want to chdir there to load the files. -c is handy to validate
 configurations so that they don't have to switch between two different
 tools
 and syntaxes. I think we can ignore the various debugging flags now
 however.

 Regards,
 Willy




Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility

2013-02-09 Thread Willy Tarreau
Hi,

On Sat, Feb 09, 2013 at 10:44:04AM +0100, Marc-Antoine Perennou wrote:
 I just made a simple test, running a webserver serving a big file locally,
 using haproxy,
 my wrapper and systemd service. I started a download and during this
 download,
 reloaded haproxy. I using nbproc = 2.
 What happened ?
 When I started haproxy, I ended up with a wrapper launching a child
 launching itself
 two children, we'll call them ha1, ha11 and ha12. Then when I reloaded, the
 wrapper
 launched a new child which launched two new children, ha2, ha21 and ha22.
 ha11 and thus ha1 were still there until the download had finished, ha12
 got to zombie state.

Then if ha1 was still there, I don't understand how it did not prevent the
new process from binding to the port. Could you please check ha1 is still
bound to the port once the process runs ? That's what I don't understand.

 ha2, ha21 and ha22 successfully have shown up and take all new connections.
 Once the download has finished, ha11 exited, ha12 too (waitpid making it
 leave the zombie state)
 and then ha11, leaving us with only ha2, ha21 and ha22.
 I think this is the expected behaviour, so there don't seem to be any bug
 here.

Yes it's the expected behaviour, but I don't understand *why* it works, so
it is very possible that we're having a bug somewhere else making this work
as a side effect.

 For the EINTR stuff, I'm not sure at all, not really familiar with it, so I
 will give it a look

Typically I would replace :

 waitpid(pid, NULL, 0);

with

 while (waitpid(pid, NULL, 0) == -1  errno == EINTR);

For instance, when your process receives SIGTTOU/SIGTTIN upon a failed
attempt of a new process to start, the old one very likely skips a few
children (one per signal).

If it can help you, here's how to test for the worst case :

  - have a running process with a simple configuration bound to one port :

listen foo
 bind :8000

  - then have a second configuration which will not work due to a double
bind on the same port, and another bind on the first port :

listen foo
 bind :8000

listen fail1
 bind :8001

listen fail2
 bind :8001

  - when the first one is running, try to start the second one. It will
fail to bind to :8000, will send a SIGTTOU to process 1 and try again.
Then it will fail to bind :8001 without knowing it's not because of #1,
so it will wait a bit, believing it's #1 which has not yet released it,
and then it will abort, sending SIGTTIN to #1 to inform it that it gives
up and that #1 must continue its job as if nothing happened.

  - process 1 should then just remain unaffected. And restarting #1 with
the fail2 listener commented out should work as expected.

Best regards,
Willy




Re: [PATCH 2/3] MEDIUM: add haproxy-systemd-wrapper

2013-02-09 Thread Willy Tarreau
On Sat, Feb 09, 2013 at 10:47:23AM +0100, Marc-Antoine Perennou wrote:
 I'll do even simpler, when submitting V2 of this patchset with the EINTR
 issue,
 I'll basically forard every arg given to the wrapper to all the children
 themselves.
 The tool is not really meant to be used by the user himself though, but
 rather to
 be launched by the systemd service.

I understand but it's quite common to expect to reuse the same command
line everywhere.

 Is SIGUSR2 ok here ? I first did it with SIGUSR1 but then children couldn't
 bind
 to this signal on reload, since it was already a USR1 action, so I took the
 first one not colliding.

6 signals are used by haproxy :

  - SIGTTOU is used to make the old process temporarily release the ports
it is bound to so that a new process may attempt a startup
  - SIGTTIN is used to tell the old process to rebind the ports that were
unbound by SIGTTOU and continue its job as if nothing happened
  - SIGUSR1 is used to make the old process unbind and wait for the last
connection to complete before exiting
  - SIGTERM is used to make the old process go away without waiting for
connections to complete
  - SIGHUP is used to dump server states in the logs
  - SIGQUIT is used to dump memory pool usages to the logs

Additionally, I'd rather keep SIGUSR2 available for later use, because
the master-worker rework that Simon Horman did makes use of this signal
to restart the process and reopen config files.

However I think that your usage of SIGUSR2 is exactly the same as for
the master-worker mode so it seems to make a lot of sense to use it in
the wrapper.

Willy




Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility

2013-02-09 Thread Marc-Antoine Perennou
On 9 February 2013 11:06, Willy Tarreau w...@1wt.eu wrote:

 Hi,

 On Sat, Feb 09, 2013 at 10:44:04AM +0100, Marc-Antoine Perennou wrote:
  I just made a simple test, running a webserver serving a big file
 locally,
  using haproxy,
  my wrapper and systemd service. I started a download and during this
  download,
  reloaded haproxy. I using nbproc = 2.
  What happened ?
  When I started haproxy, I ended up with a wrapper launching a child
  launching itself
  two children, we'll call them ha1, ha11 and ha12. Then when I reloaded,
 the
  wrapper
  launched a new child which launched two new children, ha2, ha21 and ha22.
  ha11 and thus ha1 were still there until the download had finished, ha12
  got to zombie state.

 Then if ha1 was still there, I don't understand how it did not prevent the
 new process from binding to the port. Could you please check ha1 is still
 bound to the port once the process runs ? That's what I don't understand.

  ha2, ha21 and ha22 successfully have shown up and take all new
 connections.
  Once the download has finished, ha11 exited, ha12 too (waitpid making it
  leave the zombie state)
  and then ha11, leaving us with only ha2, ha21 and ha22.
  I think this is the expected behaviour, so there don't seem to be any bug
  here.

 Yes it's the expected behaviour, but I don't understand *why* it works, so
 it is very possible that we're having a bug somewhere else making this work
 as a side effect.

  For the EINTR stuff, I'm not sure at all, not really familiar with it,
 so I
  will give it a look

 Typically I would replace :

  waitpid(pid, NULL, 0);

 with

  while (waitpid(pid, NULL, 0) == -1  errno == EINTR);

 For instance, when your process receives SIGTTOU/SIGTTIN upon a failed
 attempt of a new process to start, the old one very likely skips a few
 children (one per signal).

 If it can help you, here's how to test for the worst case :

   - have a running process with a simple configuration bound to one port :

 listen foo
  bind :8000

   - then have a second configuration which will not work due to a double
 bind on the same port, and another bind on the first port :

 listen foo
  bind :8000

 listen fail1
  bind :8001

 listen fail2
  bind :8001

   - when the first one is running, try to start the second one. It will
 fail to bind to :8000, will send a SIGTTOU to process 1 and try again.
 Then it will fail to bind :8001 without knowing it's not because of #1,
 so it will wait a bit, believing it's #1 which has not yet released it,
 and then it will abort, sending SIGTTIN to #1 to inform it that it
 gives
 up and that #1 must continue its job as if nothing happened.

   - process 1 should then just remain unaffected. And restarting #1 with
 the fail2 listener commented out should work as expected.

 Best regards,
 Willy


Maybe this can help you understand why it works:

Before the download:
haproxy   6090 root4u IPv4  53819  0t0 TCP
*:http (LISTEN)
haproxy   6092 root4u IPv4  53819  0t0 TCP
*:http (LISTEN)
haproxy   6093 root4u IPv4  53819  0t0 TCP
*:http (LISTEN)

During the download:
haproxy   6090 root4u IPv4  53819  0t0 TCP
*:http (LISTEN)
haproxy   6092 root4u IPv4  53819  0t0 TCP
*:http (LISTEN)
haproxy   6093 root1u IPv4  57972  0t0 TCP
Lou.local:http-Lou.local:40395 (ESTABLISHED)
haproxy   6093 root4u IPv4  53819  0t0 TCP
*:http (LISTEN)

During the download, after the reload:
haproxy   6093 root1u IPv4  57972  0t0 TCP
Lou.local:http-Lou.local:40395 (ESTABLISHED)
haproxy   6239 root4u IPv4  54622  0t0 TCP
*:http (LISTEN)
haproxy   6240 root4u IPv4  54622  0t0 TCP
*:http (LISTEN)
haproxy   6241 root4u IPv4  54622  0t0 TCP
*:http (LISTEN)

After the download:
haproxy   6239 root4u IPv4  54622  0t0 TCP
*:http (LISTEN)
haproxy   6240 root4u IPv4  54622  0t0 TCP
*:http (LISTEN)
haproxy   6241 root4u IPv4  54622  0t0 TCP
*:http (LISTEN)


Re: [PATCH 1/3] MEDIUM: New cli option -Ds for systemd compatibility

2013-02-09 Thread Willy Tarreau
On Sat, Feb 09, 2013 at 11:45:57AM +0100, Marc-Antoine Perennou wrote:
 Maybe this can help you understand why it works:
 
 Before the download:
 haproxy   6090 root4u IPv4  53819  0t0 TCP
 *:http (LISTEN)
 haproxy   6092 root4u IPv4  53819  0t0 TCP
 *:http (LISTEN)
 haproxy   6093 root4u IPv4  53819  0t0 TCP
 *:http (LISTEN)
 
 During the download:
 haproxy   6090 root4u IPv4  53819  0t0 TCP
 *:http (LISTEN)
 haproxy   6092 root4u IPv4  53819  0t0 TCP
 *:http (LISTEN)
 haproxy   6093 root1u IPv4  57972  0t0 TCP
 Lou.local:http-Lou.local:40395 (ESTABLISHED)
 haproxy   6093 root4u IPv4  53819  0t0 TCP
 *:http (LISTEN)
 
 During the download, after the reload:
 haproxy   6093 root1u IPv4  57972  0t0 TCP
 Lou.local:http-Lou.local:40395 (ESTABLISHED)
 haproxy   6239 root4u IPv4  54622  0t0 TCP
 *:http (LISTEN)
 haproxy   6240 root4u IPv4  54622  0t0 TCP
 *:http (LISTEN)
 haproxy   6241 root4u IPv4  54622  0t0 TCP
 *:http (LISTEN)

OK so the old parent process got the signal and left the place. I
don't know why, but I thought that the pidfile only contained the
children pids, but here it's clear that the parent got the signal
as well.

That's fine then.

Thanks,
Willy




Re: SSL handshake failure

2013-02-09 Thread Samat Galimov
Can I help in any way?


On Fri, Feb 8, 2013 at 11:29 PM, Willy Tarreau w...@1wt.eu wrote:

 On Fri, Feb 08, 2013 at 11:27:31PM +0400, Samat Galimov wrote:
  You are right.
  After I start haproxy with patch it establishes first connection
 normally,
  as it should.
  All following connections are randomly dropping an error as before.

 OK, so maybe there are still some places where the change is needed. We'll
 audit the code again to try to find something else.

 Thanks,
 Willy




Re: SSL handshake failure

2013-02-09 Thread Willy Tarreau
On Sat, Feb 09, 2013 at 08:45:50PM +0400, Samat Galimov wrote:
 Can I help in any way?

Not yes I think, I'll check with Emeric if we can find other locations
where errors might remain uncaught.

Thanks!
Willy