Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread Apollon Oikonomopoulos
On 17:30 Thu 12 Oct , William Lallemand wrote:
> On Thu, Oct 12, 2017 at 05:50:52PM +0300, Apollon Oikonomopoulos wrote:
> > Yes, there are. systemd will only perform a single operation on a 
> > unit at a time, and will queue up the rest. When you inform systemd 
> > that something (startup/reload) is in progress, it will not let any 
> > other action happen until the first operation is finished. Now it's 
> > trivial to issue a ton of reloads in a row that will leave a ton of 
> > old processes lying around until they terminate.
>  
> I don't think you can, either with the master-worker or the wrapper, it was 
> one
> of the problems we had in the past.
> 
> The master-worker waits to be ready to handle the signals, and the wrapper 
> waits
> for a pipe to be closed on the children side to handle signals.

Interesting, thanks! I guess I'm still stuck in the 1.6 era, I have some 
catch-up to do :)

Regards,
Apollon



Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread William Lallemand
On Thu, Oct 12, 2017 at 05:50:52PM +0300, Apollon Oikonomopoulos wrote:
> > 
> > One helpful feature I read in the documentation is the usage of the
> > sd_notify(..  "READY=1").  It can be useful for configuration files that 
> > takes
> > time to process, for example those with a lot of ssl frontends. This signal
> > could be send once the children has been forked.
> > 
> > It's difficult to know when a reload is completely finished (old processes
> > killed) in case of long TCP sessions. So, if we use this system there is a 
> > risk
> > to trigger a timeout in systemd on the reload isn't it?
> 
> The Reload timeout is apparently controlled by TimeoutStartSec in 
> systemd.
> 
> > feature for the reload, it should be done after the fork of the new 
> > processes,
> > not after the leaving of the old processes, because the processes are 
> > ready to
> > receive traffic at this stage.
> 
> That's true. OTOH the problem with haproxy-systemd-wrapper is that once 
> it re-exec's itself it loses track of the old processes completely 
> (IIRC),

That's right, but we won't fix it in the wrapper, the current architecture
doesn't allow it easily, and it's not reasonable to backport the master-worker
in a stable branch. Those problems will be fixed with the master-worker in 1.8.

> combined with the fact that old processes may eat up a lot of 
> memory. There are cases where you would prefer breaking a long TCP 
> session after 30s if it would give you back 2GB of RSS, to having the 
> process lying around just for one client.

Sure, that can be done in the haproxy config file with the hard-stop-after 
keyword.

> > Are there really advantages to letting know systemd when a reload is 
> > finished
> > or when a process is ready?
> 
> Yes, there are. systemd will only perform a single operation on a unit 
> at a time, and will queue up the rest. When you inform systemd that 
> something (startup/reload) is in progress, it will not let any other 
> action happen until the first operation is finished. Now it's trivial to 
> issue a ton of reloads in a row that will leave a ton of old processes 
> lying around until they terminate.
 
I don't think you can, either with the master-worker or the wrapper, it was one
of the problems we had in the past.

The master-worker waits to be ready to handle the signals, and the wrapper waits
for a pipe to be closed on the children side to handle signals.

> The other advantage with Type=notify services is that systemd will wait 
> for READY=1 before starting units with After=haproxy (although HAProxy 
> is really a "leaf" kind of service).
> 

That's interesting, nobody ever ask for that, but I see a few cases where that
can be useful.

> Note that the dependency is really thin, and you can always make it a 
> compile-time option. 
> 
> Regards,
> Apollon

Cheers,

-- 
William Lallemand



Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread Apollon Oikonomopoulos
On 16:17 Thu 12 Oct , William Lallemand wrote:
> Hi,
> 
> On Thu, Oct 12, 2017 at 01:19:58PM +0300, Apollon Oikonomopoulos wrote:
> > The biggest issue here is that we are using a signal to trigger the 
> > reload (which is a complex, non-atomic operation) and let things settle 
> > on their own. Systemd assumes that as soon as the signal is delivered 
> > (i.e.  the ExecReload command is done), the reload is finished, while in 
> > our case the reload is finished when the old haproxy process is really 
> > dead. Using a signal to trigger the reload is handy, so we could keep 
> > that, but the wrapper would need some changes to make reloads more 
> > robust:
> > 
> >  1. It should use sd_notify(3) to communicate the start/stop/reload 
> > status to systemd (that would also mean converting the actual 
> > service to Type=notify). This way no other operation will be 
> > performed on the unit until the reload is finished and the process 
> > group is in a known-good state.
> > 
> >  2. It should handle the old process better: apart from relying on the 
> > new haproxy process for killing the old one, it should explicitly 
> > SIGKILL it after a given timeout if it's not dead yet and make sure 
> > reloads are timeboxed.
> > 
> > IIUC, in 1.8 the wrapper has been replaced by the master process which 
> > seems to do point 2 above, but point 1 is something that should still be 
> > handled IMHO.
> 
> One helpful feature I read in the documentation is the usage of the
> sd_notify(..  "READY=1").  It can be useful for configuration files that takes
> time to process, for example those with a lot of ssl frontends. This signal
> could be send once the children has been forked.
> 
> It's difficult to know when a reload is completely finished (old processes
> killed) in case of long TCP sessions. So, if we use this system there is a 
> risk
> to trigger a timeout in systemd on the reload isn't it?

The Reload timeout is apparently controlled by TimeoutStartSec in 
systemd.

> feature for the reload, it should be done after the fork of the new 
> processes,
> not after the leaving of the old processes, because the processes are 
> ready to
> receive traffic at this stage.

That's true. OTOH the problem with haproxy-systemd-wrapper is that once 
it re-exec's itself it loses track of the old processes completely 
(IIRC), combined with the fact that old processes may eat up a lot of 
memory. There are cases where you would prefer breaking a long TCP 
session after 30s if it would give you back 2GB of RSS, to having the 
process lying around just for one client.

> 
> However I'm not sure it's that useful, you can know when a process is ready
> using the logs, and it will add specific code for systemd and a dependency.
> 
> Are there really advantages to letting know systemd when a reload is finished
> or when a process is ready?

Yes, there are. systemd will only perform a single operation on a unit 
at a time, and will queue up the rest. When you inform systemd that 
something (startup/reload) is in progress, it will not let any other 
action happen until the first operation is finished. Now it's trivial to 
issue a ton of reloads in a row that will leave a ton of old processes 
lying around until they terminate.

The other advantage with Type=notify services is that systemd will wait 
for READY=1 before starting units with After=haproxy (although HAProxy 
is really a "leaf" kind of service).

Note that the dependency is really thin, and you can always make it a 
compile-time option. 

Regards,
Apollon



Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread William Lallemand
Hi,

On Thu, Oct 12, 2017 at 01:19:58PM +0300, Apollon Oikonomopoulos wrote:
> The biggest issue here is that we are using a signal to trigger the 
> reload (which is a complex, non-atomic operation) and let things settle 
> on their own. Systemd assumes that as soon as the signal is delivered 
> (i.e.  the ExecReload command is done), the reload is finished, while in 
> our case the reload is finished when the old haproxy process is really 
> dead. Using a signal to trigger the reload is handy, so we could keep 
> that, but the wrapper would need some changes to make reloads more 
> robust:
> 
>  1. It should use sd_notify(3) to communicate the start/stop/reload 
> status to systemd (that would also mean converting the actual 
> service to Type=notify). This way no other operation will be 
> performed on the unit until the reload is finished and the process 
> group is in a known-good state.
> 
>  2. It should handle the old process better: apart from relying on the 
> new haproxy process for killing the old one, it should explicitly 
> SIGKILL it after a given timeout if it's not dead yet and make sure 
> reloads are timeboxed.
> 
> IIUC, in 1.8 the wrapper has been replaced by the master process which 
> seems to do point 2 above, but point 1 is something that should still be 
> handled IMHO.

One helpful feature I read in the documentation is the usage of the
sd_notify(..  "READY=1").  It can be useful for configuration files that takes
time to process, for example those with a lot of ssl frontends. This signal
could be send once the children has been forked.

It's difficult to know when a reload is completely finished (old processes
killed) in case of long TCP sessions. So, if we use this system there is a risk
to trigger a timeout in systemd on the reload isn't it? If we want to use this
feature for the reload, it should be done after the fork of the new processes,
not after the leaving of the old processes, because the processes are ready to
receive traffic at this stage.

However I'm not sure it's that useful, you can know when a process is ready
using the logs, and it will add specific code for systemd and a dependency.

Are there really advantages to letting know systemd when a reload is finished
or when a process is ready?

Cheers,

-- 
William Lallemand



Sponsorship

2017-10-12 Thread Alex Smith
Hello! 



My name is Alex and I represent the website bestvpnrating.com



Our company is aimed at sponsoring your project. How can we be listed among 
your sponsors? 



 






Looking forward to your reply,

Best regards,

Alex Smith.










yahrt! (yet another hardware recommendation thread)

2017-10-12 Thread Elias Abacioglu
Hi guys

I know HW recommendations are based on the type of load.
We currently have a LB setup comprised of 3 Internet facing nodes used for
transactional requests.
Currently around 100k/s HTTP requests (+12k tcp session rate)
This is split in 3 nodes and avg session time is 1.5s for HTTP traffic.
And these LB's also do TCP relaying to other HAproxy nodes doing TLS
termination. They relay around 30k SSL requests/s(12k TCP session rate).
We want to keep our latency as low as possible.

Currently we have 3x 4core nodes running as HTTP and TCP load balancers.
One of these 3 nodes are older and we are thinking of replacing it or all
of the 3 internet facing nodes.

The two newer nodes can handle the traffic when they are just two, but it's
close to the edge I feel. Especially if we would get more traffic. And
we've had cases when HAproxy is in a state where it doesn't work with just
two(like when you've done a bunch of reload instead of restarts).
They have E3-1280 v6 @ 3.90GHz.

I'm having a hard time choosing hardware.
One thought is that in the future TLS termination will probably increase,
currently its 100k HTTP + 30k TLS(which currently are relayed to the
HAproxy TLS LB). But one would perhaps design the machine to be a bit
future proof.

And 1 socket 4 core nodes feels that it won't be so scalable.

At one hand I have to choose a CPU is either +2Ghz with more cores or a CPU
with +3Ghz with less cores.
And after that I have to choose if I'm going to go with 2 sockets and less
nodes or 1 socket with more nodes.
One could argue throwing more hardware would solve a bit of the problem
that that it causes (problems that comes with multiple cores + sockets).


For instance looking at Dell latest generation(G14) CPU offering.
If looking at what I consider decently priced CPU.
You have the option to go for around +2GHz mult core
Intel® Xeon® Gold 6132 2.6G,14C
Intel® Xeon® Gold 5120 2.2G,14C - litte bit cheaper

Or you can go for +3Ghz with a bit less cores.
Intel Xeon Gold 5122 3.6G,4C - little bit cheaper
Intel Xeon Gold 6128 3.4G,6C - not available in for PE R440 (only PE R640)
Intel Xeon Gold 6136 3.0G,12C - not available in for PE R440 (only PE R640)

And then you have the super expensive 3Ghz CPUs.
Intel Xeon Gold 6134 3.2G,8C - not available in for PE R440 (only PE R640)
Intel Xeon Gold 6154 3.0G,18C - not available in for PE R440 (only PE R640)
Intel Xeon Gold 6144 3.5G,8C - not available in for PE R440 (only PE R640)

I have a hard time choosing
2ghz or 3ghz?
1 socket or 2 socket?

Any advice?

Thanks,
Elias


Re: Haproxy refuses new connections when doing a reload followed by a restart

2017-10-12 Thread Apollon Oikonomopoulos
Hi all,

On 22:01 Wed 04 Oct , Lukas Tribus wrote:
> Hello Moemen,
> 
> 
> Am 04.10.2017 um 19:21 schrieb Moemen MHEDHBI:
> >
> > I am wondering if this is actually an expected behaviour and if maybe
> > that restart/stop should just shutdown the process and its open connections.
> > I have made the following tests:
> > 1/ keep an open connection then do a restart will work correctly without
> > waiting for existing connections to be closed.
> 
> You're right, I got confused there.
> 
> Stop or restart is supposed to kill existing connections without any 
> timeouts, and
> systemd would signal it with a SIGTERM to the systemd-wrapper:
> 
> https://cbonte.github.io/haproxy-dconv/1.7/management.html#4
> 
> 
> 
> > I think it makes more sense to say that restart will not wait for
> > established connections.
> 
> Correct, that's the documented behavior as per haproxy documentation and
> really the implicit assumption when talking about stopping/restarting in a 
> process
> management context (I need more coffee).
> 
> 
> 
> >   Otherwise there will be no difference between
> > reload and restart unless there is something else am not aware of.
> > If we need to fix 2/, a possible solution would be:
> > - Set killmode to "control-group" rather than "mixed" (the current
> > value) in systemd unit file.
> 
> Indeed the mixed killmode was a conscious choice:
> https://marc.info/?l=haproxy&m=141277054505608&w=2
> 
> 
> I guess the problem is that when a reload happens before a restart and 
> pre-reload
> systemd-wrapper process is still alive, systemd gets confused by that old 
> process
> and therefor, refrains from starting up the new instance.

The biggest issue here is that we are using a signal to trigger the 
reload (which is a complex, non-atomic operation) and let things settle 
on their own. Systemd assumes that as soon as the signal is delivered 
(i.e.  the ExecReload command is done), the reload is finished, while in 
our case the reload is finished when the old haproxy process is really 
dead. Using a signal to trigger the reload is handy, so we could keep 
that, but the wrapper would need some changes to make reloads more 
robust:

 1. It should use sd_notify(3) to communicate the start/stop/reload 
status to systemd (that would also mean converting the actual 
service to Type=notify). This way no other operation will be 
performed on the unit until the reload is finished and the process 
group is in a known-good state.

 2. It should handle the old process better: apart from relying on the 
new haproxy process for killing the old one, it should explicitly 
SIGKILL it after a given timeout if it's not dead yet and make sure 
reloads are timeboxed.

IIUC, in 1.8 the wrapper has been replaced by the master process which 
seems to do point 2 above, but point 1 is something that should still be 
handled IMHO.

Regards,
Apollon



RE: Unable to modify haproxy stats url header

2017-10-12 Thread Suraj Bora -X (surbora - HCL AMERICA INC at Cisco)
Hi Lukas, Willy,

Thanks for confimation.

We are running some security scan on haproxy urls and found that Ha-proxy 
status URL has following vernability: 
1. Cacheable SSL Page Found
2. Missing HTTP Strict-Transport-Security Header Query 

To resolve this we need to update the http response with below parameters

rspadd Cache-Control:\ no-store,no-cache,private
rspadd Pragma:\ no-cache
rspadd Strict-Transport-Security:

It will be useful if we have this fucntionality.

Thanks and Regards,
Suraj Bora
surb...@cisco.com

Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
http://www.cisco.com/c/en/us/about/legal/terms-sale-software-license-agreement/company-registration-information.html
-Original Message-
From: Willy Tarreau [mailto:w...@1wt.eu] 
Sent: 11 October 2017 23:27
To: Lukas Tribus 
Cc: Suraj Bora -X (surbora - HCL AMERICA INC at Cisco) ; 
haproxy@formilux.org
Subject: Re: Unable to modify haproxy stats url header

Hi Lukas,

On Wed, Oct 11, 2017 at 06:23:23PM +0200, Lukas Tribus wrote:
> Hello Suraj, hello Willy,
> 
> 
> > frontend stats_proxy
> >     bind :ssl crt  no-sslv3 
> > no-tlsv10 ciphers 
> >     mode http
> >     default_backend stats_server
> >     rspadd Cache-Control:\ no-store,no-cache,private
> >     rspadd Pragma:\ no-cache
> >     rspadd Strict-Transport-Security:
> >
> > backend stats_server
> >     mode http
> >     option httpclose
> >     option abortonclose
> >     stats enable
> >     stats refresh 60s
> >     stats hide-version
> 
> rspadd does not work for the stats backend.
> 
> This is definitely a change in behavior in 1.5-dev due to 70730ddd
> ("MEDIUM: http: enable analysers to have keep-alive on stats"):
> 
> 
> (from the 70730ddd commit message):
> > We ensure to skip filters because we don't want to unexpectedly 
> > block a response nor to mangle response headers.
> 
> Skipping filters causes the behavior reported in this thread.
> 
> 
> Do we support this use case though? Do we consider this a regression?
> What do you think, Willy?

Originally it did not work as the stats contents were directly injected into 
the response buffer without any analyser, but since we moved it to an applet, 
it allowed to support compression and keep-alive, and by extension other HTTP 
processing.

I tend to think that if some users rely on this behaviour, we should make 
reasonable efforts to try to make it work again. If there's a technical 
showstopper, I'm fine with that but I don't have any in mind and I suspect it's 
more related to the accidental lack of an analyser flag that nobody considered 
worth setting on the response channel when switching to the stats.

To be honnest, now looking at the code I'm a bit puzzled because I don't 
understand anymore either how/when the response analysers needed for the 
compression and/or keep-alive are set, or how the AN_RES_HTTP_PROCESS_BE flag 
is removed. I'll probably have to check deeper but now this looks more like an 
accidental removal.

Thanks,
Willy