date:20170512

Re: Bug: DNS changes in 1.7.3+ break UNIX socket stats in daemon mode with resolvers on FreeBSD

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 08:58:56AM +0200, Lukas Tribus wrote:
> Hi,
> 
> 
> Am 11.05.2017 um 21:13 schrieb Jim Pingle:
> > On 05/11/2017 01:58 PM, Frederic Lecaille wrote:
> >> I have reproduced (at home) the stats socket issue within a FreeBSD 9.3 VM.
> >>
> >> Replacing your call to close() by fd_delete() which removes the fd from
> >> the fd set used by kevent *and close it* seems to fix at least the stats
> >> socket issue. I do not know if there are remaining ones.
> >>
> >> I did not reproduced the kevent issue revealed by Lukas traces. But I
> >> had other ones : ERR#57 'Socket is not connected' during sendto().
> >>
> >> I attached a temporary patch to be validated and to let you perhaps
> >> provide a better one as I have not double check everything.
> > Fred,
> >
> > That seems to have fixed the problem for me. With that patch applied,
> > web traffic passes and the UNIX socket responds.
> 
> Confirmed, works for me too. Baptiste? Willy? Is this an acceptable fix?

Yes definitely, not only an acceptable one, but the right fix. I understand
why it happens to work on linux, by default close() unregisters FDs from
epoll so it passed below the radar.

I'm expecting to spend the day to dig through the ton of pending patches and
fixes, so if the queue is not too long, I should reach this one as well today
:-)

Cheers,
Willy

Re: Reloading maps?

2017-05-12 Thread Willy Tarreau

On Thu, May 11, 2017 at 04:23:14PM -0700, James Brown wrote:
> Is there any good way to reload a map, short of either (a) reloading
> haproxy every time the map changes, or (b) feeding the entire map into the
> control socket as a series of `set map` statements?
> 
> I've got a map generated by an external program; we're currently doing (b)
> and it feels a little fragile...

We could possibly imagine implementing a "bulk import" mode on the CLI
to address this. We could imagine bringing atomicity this way. There are
also alternatives consisting in periodically retrieving them from a URL,
as implemented in the enterprise version, but we don't have this here.

Willy

Re: Bug: DNS changes in 1.7.3+ break UNIX socket stats in daemon mode with resolvers on FreeBSD

2017-05-12 Thread Frederic Lecaille


On 05/12/2017 09:37 AM, Willy Tarreau wrote:

On Fri, May 12, 2017 at 08:58:56AM +0200, Lukas Tribus wrote:

Hi,


Am 11.05.2017 um 21:13 schrieb Jim Pingle:

On 05/11/2017 01:58 PM, Frederic Lecaille wrote:

I have reproduced (at home) the stats socket issue within a FreeBSD 9.3 VM.

Replacing your call to close() by fd_delete() which removes the fd from
the fd set used by kevent *and close it* seems to fix at least the stats
socket issue. I do not know if there are remaining ones.

I did not reproduced the kevent issue revealed by Lukas traces. But I
had other ones : ERR#57 'Socket is not connected' during sendto().

I attached a temporary patch to be validated and to let you perhaps
provide a better one as I have not double check everything.

Fred,

That seems to have fixed the problem for me. With that patch applied,
web traffic passes and the UNIX socket responds.


Confirmed, works for me too. Baptiste? Willy? Is this an acceptable fix?


Yes definitely, not only an acceptable one, but the right fix. I understand
why it happens to work on linux, by default close() unregisters FDs from
epoll so it passed below the radar.


Ok so Willy I will send a well-formed patch asap.

Re: haproxy not creating stick-table entries fast enough

2017-05-12 Thread Willy Tarreau

On Tue, May 09, 2017 at 09:43:22PM -0700, redundantl y wrote:
> For example, I have tried with the latest versions of Firefox, Safari, and
> Chrome.  With 30 elements on the page being loaded from the server they're
> all being loaded within 70ms of each other, the first 5 or so happening on
> the same millisecond.  I'm seeing similar behaviour, being sent to
> alternating backend servers until it "settles" and sticks to just one.

That's only true after the browser starts to retrieve the main page which
gives it the indication that it needs to request such objects. You *always*
have a first request before all other ones. The browser cannot guess it
will have to retrieve many objects out of nowhere.

The principle of stickiness is to ensure that subsequent requests will go
to the same server that served the previous ones. The main goal is to
ensure that all requests carrying a session cookie will end up on the
server which holds this session.

Here as Lukas explained, you're simulating a browser sending many totally
independant requests in parallel. There's no reason (nor any way) that
any equipment in the chain would guess they are related since they could
arrive in any order, and even end up on multiple nodes.

If despite this that's what you need (for a very obscure reason), then
you'd rather use hashing for this. It will ensure that the same distribution
algorithm is applied to all these requests regardless of their ordering. But
let me tell you that it still makes me feel like you're trying to address
the wrong problem.

Also, most people prefer not to apply stickiness for static objects so that
they can be retrieved in parallel from all static servers instead of all
hammering the same server. It might possibly not be your case based on your
explanation, but this is what people usually do for a better user experience.

In conclusion, your expected use case still seem quite obscure to me :-/

Willy

Re: Bug: DNS changes in 1.7.3+ break UNIX socket stats in daemon mode with resolvers on FreeBSD

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 09:48:56AM +0200, Frederic Lecaille wrote:
> On 05/12/2017 09:37 AM, Willy Tarreau wrote:
> > On Fri, May 12, 2017 at 08:58:56AM +0200, Lukas Tribus wrote:
> > > Hi,
> > > 
> > > 
> > > Am 11.05.2017 um 21:13 schrieb Jim Pingle:
> > > > On 05/11/2017 01:58 PM, Frederic Lecaille wrote:
> > > > > I have reproduced (at home) the stats socket issue within a FreeBSD 
> > > > > 9.3 VM.
> > > > > 
> > > > > Replacing your call to close() by fd_delete() which removes the fd 
> > > > > from
> > > > > the fd set used by kevent *and close it* seems to fix at least the 
> > > > > stats
> > > > > socket issue. I do not know if there are remaining ones.
> > > > > 
> > > > > I did not reproduced the kevent issue revealed by Lukas traces. But I
> > > > > had other ones : ERR#57 'Socket is not connected' during sendto().
> > > > > 
> > > > > I attached a temporary patch to be validated and to let you perhaps
> > > > > provide a better one as I have not double check everything.
> > > > Fred,
> > > > 
> > > > That seems to have fixed the problem for me. With that patch applied,
> > > > web traffic passes and the UNIX socket responds.
> > > 
> > > Confirmed, works for me too. Baptiste? Willy? Is this an acceptable fix?
> > 
> > Yes definitely, not only an acceptable one, but the right fix. I understand
> > why it happens to work on linux, by default close() unregisters FDs from
> > epoll so it passed below the radar.
> 
> Ok so Willy I will send a well-formed patch asap.

Thanks Fred!
Willy

Re: Bug: DNS changes in 1.7.3+ break UNIX socket stats in daemon mode with resolvers on FreeBSD

2017-05-12 Thread Frederic Lecaille


On 05/12/2017 09:52 AM, Willy Tarreau wrote:

On Fri, May 12, 2017 at 09:48:56AM +0200, Frederic Lecaille wrote:

On 05/12/2017 09:37 AM, Willy Tarreau wrote:

On Fri, May 12, 2017 at 08:58:56AM +0200, Lukas Tribus wrote:

Hi,


Am 11.05.2017 um 21:13 schrieb Jim Pingle:

On 05/11/2017 01:58 PM, Frederic Lecaille wrote:

I have reproduced (at home) the stats socket issue within a FreeBSD 9.3 VM.

Replacing your call to close() by fd_delete() which removes the fd from
the fd set used by kevent *and close it* seems to fix at least the stats
socket issue. I do not know if there are remaining ones.

I did not reproduced the kevent issue revealed by Lukas traces. But I
had other ones : ERR#57 'Socket is not connected' during sendto().

I attached a temporary patch to be validated and to let you perhaps
provide a better one as I have not double check everything.

Fred,

That seems to have fixed the problem for me. With that patch applied,
web traffic passes and the UNIX socket responds.


Confirmed, works for me too. Baptiste? Willy? Is this an acceptable fix?


Yes definitely, not only an acceptable one, but the right fix. I understand
why it happens to work on linux, by default close() unregisters FDs from
epoll so it passed below the radar.


Ok so Willy I will send a well-formed patch asap.


Thanks Fred!
Willy


Here is a more well-formed patch.
Feel free to amend the commit message if not enough clear ;)

Regards,

Fred.

>From e6c4a93bbc8838046ab9737bbd5d4be075a72393 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20L=C3=A9caille?= 
Date: Fri, 12 May 2017 09:57:15 +0200
Subject: [PATCH] BUG/MAJOR: dns: Broken kqueue events handling (BSD systems).

Some DNS related network sockets were closed without unregistering their file
descriptors from their underlying kqueue event sets. This patch replaces calls to
close() by fd_delete() calls to that to delete such events attached to DNS
network sockets from the kqueue before closing the sockets.
---
 src/dns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/dns.c b/src/dns.c
index a118598..cb0a9a9 100644
--- a/src/dns.c
+++ b/src/dns.c
@@ -1004,7 +1004,7 @@ int dns_init_resolvers(int close_socket)
 
 			if (close_socket == 1) {
 if (curnameserver->dgram) {
-	close(curnameserver->dgram->t.sock.fd);
+	fd_delete(curnameserver->dgram->t.sock.fd);
 	memset(curnameserver->dgram, '\0', sizeof(*dgram));
 	dgram = curnameserver->dgram;
 }
-- 
2.1.4

Re: haproxy + RDP

2017-05-12 Thread Antonio Trujillo Carmona

El 11/05/17 a las 15:06, Aleksandar Lazic escribió:
> .../
> How about to activate the 'option tcp-check' as mentioned in the
> Warning?
> In the config below is it's commented, any reason why?
>
> It's also active in the doc which you maybe know.
>
> https://www.haproxy.com/doc/aloha/7.0/deployment_guides/microsoft_remote_desktop_services.html
>
> Does this changes anything?
ok cleaing up a liter I try:
frontend RDP
mode tcp
bind *:3389
timeout client 1h
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
default_backend bk_rdp
#
backend bk_rdp
mode tcp
balance leastconn
#balance rdp_coockie
timeout server 1h
timeout connect 4s
log global
option tcplog
stick-table type string len 32 size 10k expire 1h peers pares
stick on rdp_cookie(msthash)
#   persist rdp-cookie
option tcp-check
#   option ssl-hello-chk
#   option tcpka
tcp-check connect port 3389 ssl

#   server gr43sterminal01  10.104.22.142:3389 weight 1 check verify
none inter 2000 rise 2 fall 3
#   server gr43sterminal02  10.104.23.141:3389 weight 1 check verify
none inter 2000 rise 2 fall 3
#
default-server inter 3s rise 2 fall 3
server gr43sterminal01  10.104.22.142:3389 weight 1 check
server gr43sterminal02  10.104.23.141:3389 weight 1 check

And I got:
[ALERT] 131/100222 (8564) : Proxy 'bk_rdp', server 'gr43sterminal01'
[/etc/haproxy/haproxy.cfg:189] verify is enabled by default but no CA
file specified. If you're running on a LAN where you're certain to trust
the server's certificate, please set an explicit 'verify none' statement
on the 'server' line, or use 'ssl-server-verify none' in the global
section to disable server-side verifications by default.
[ALERT] 131/100222 (8564) : Proxy 'bk_rdp', server 'gr43sterminal02'
[/etc/haproxy/haproxy.cfg:190] verify is enabled by default but no CA
file specified. If you're running on a LAN where you're certain to trust
the server's certificate, please set an explicit 'verify none' statement
on the 'server' line, or use 'ssl-server-verify none' in the global
section to disable server-side verifications by default.
[ALERT] 131/100222 (8564) : Fatal errors found in configuration.

So I try adding verify none in server line

and haproxy see both server up (but one is down).
I try  withou ssl:

tcp-check connect port 3389
server gr43sterminal01  10.104.22.142:3389 weight 1 check
server gr43sterminal02  10.104.23.141:3389 weight 1 check

but the result is the same haproxy see both server up (but one is down)

only if I leve only option tcp-check (or none) it seem work


#
#   persist rdp-cookie
option tcp-check
#   option ssl-hello-chk
#   option tcpka
#   tcp-check connect port 3389 ssl
#   tcp-check connect port 3389

#   server gr43sterminal01  10.104.22.142:3389 weight 1 check verify
none inter 2000 rise 2 fall 3
#   server gr43sterminal02  10.104.23.141:3389 weight 1 check verify
none inter 2000 rise 2 fall 3
#
default-server inter 3s rise 2 fall 3
server gr43sterminal01  10.104.22.142:3389 weight 1 check
server gr43sterminal02  10.104.23.141:3389 weight 1 check
##


output:

[WARNING] 131/102105 (8773) : Server bk_rdp/gr43sterminal01 is DOWN,
reason: Layer4 timeout, info: " at initial connection step of
tcp-check", check duration: 3001ms. 1 active and 0 backup servers left.
0 sessions active, 0 requeued, 0 remaining in queue.






-- 

*Antonio Trujillo Carmona*

*Técnico de redes y sistemas.*

*Subdirección de Tecnologías de la Información y Comunicaciones*

Servicio Andaluz de Salud. Consejería de Salud de la Junta de Andalucía

_antonio.trujillo.sspa@juntadeandalucia.es_

Tel. +34 670947670 747670)

Re: haproxy + RDP

2017-05-12 Thread Aleksandar Lazic

Hi Antonio Trujillo Carmona.

Antonio Trujillo Carmona have written on Fri, 12 May 2017 10:23:59
+0200:

> El 11/05/17 a las 15:06, Aleksandar Lazic escribió:
> > .../
> > How about to activate the 'option tcp-check' as mentioned in the
> > Warning?
> > In the config below is it's commented, any reason why?
> >
> > It's also active in the doc which you maybe know.
> >
> > https://www.haproxy.com/doc/aloha/7.0/deployment_guides/microsoft_remote_desktop_services.html
> >
> > Does this changes anything?  
> ok cleaing up a liter I try:
> frontend RDP
> mode tcp
> bind *:3389
> timeout client 1h
> tcp-request inspect-delay 5s
> tcp-request content accept if RDP_COOKIE
> default_backend bk_rdp
> #
> backend bk_rdp
> mode tcp
> balance leastconn
> #balance rdp_coockie
> timeout server 1h
> timeout connect 4s
> log global
> option tcplog
> stick-table type string len 32 size 10k expire 1h peers pares
> stick on rdp_cookie(msthash)
> #   persist rdp-cookie
> option tcp-check
> #   option ssl-hello-chk
> #   option tcpka
> tcp-check connect port 3389 ssl
> 
> #   server gr43sterminal01  10.104.22.142:3389 weight 1 check
> verify none inter 2000 rise 2 fall 3
> #   server gr43sterminal02  10.104.23.141:3389 weight 1 check
> verify none inter 2000 rise 2 fall 3
> #
> default-server inter 3s rise 2 fall 3
> server gr43sterminal01  10.104.22.142:3389 weight 1 check
> server gr43sterminal02  10.104.23.141:3389 weight 1 check
> 
> And I got:
> [ALERT] 131/100222 (8564) : Proxy 'bk_rdp', server 'gr43sterminal01'
> [/etc/haproxy/haproxy.cfg:189] verify is enabled by default but no CA
> file specified. If you're running on a LAN where you're certain to
> trust the server's certificate, please set an explicit 'verify none'
> statement on the 'server' line, or use 'ssl-server-verify none' in
> the global section to disable server-side verifications by default.
> [ALERT] 131/100222 (8564) : Proxy 'bk_rdp', server 'gr43sterminal02'
> [/etc/haproxy/haproxy.cfg:190] verify is enabled by default but no CA
> file specified. If you're running on a LAN where you're certain to
> trust the server's certificate, please set an explicit 'verify none'
> statement on the 'server' line, or use 'ssl-server-verify none' in
> the global section to disable server-side verifications by default.
> [ALERT] 131/100222 (8564) : Fatal errors found in configuration.
> 
> So I try adding verify none in server line
> 
> and haproxy see both server up (but one is down).
> I try  withou ssl:
> 
> tcp-check connect port 3389
> server gr43sterminal01  10.104.22.142:3389 weight 1 check
> server gr43sterminal02  10.104.23.141:3389 weight 1 check
> 
> but the result is the same haproxy see both server up (but one is
> down)
> 
> only if I leve only option tcp-check (or none) it seem work
> 
> 
> #
> #   persist rdp-cookie
> option tcp-check
> #   option ssl-hello-chk
> #   option tcpka
> #   tcp-check connect port 3389 ssl
> #   tcp-check connect port 3389
> 
> #   server gr43sterminal01  10.104.22.142:3389 weight 1 check
> verify none inter 2000 rise 2 fall 3
> #   server gr43sterminal02  10.104.23.141:3389 weight 1 check
> verify none inter 2000 rise 2 fall 3
> #
> default-server inter 3s rise 2 fall 3
> server gr43sterminal01  10.104.22.142:3389 weight 1 check
> server gr43sterminal02  10.104.23.141:3389 weight 1 check
> ##
> 
> 
> output:
> 
> [WARNING] 131/102105 (8773) : Server bk_rdp/gr43sterminal01 is DOWN,
> reason: Layer4 timeout, info: " at initial connection step of
> tcp-check", check duration: 3001ms. 1 active and 0 backup servers
> left. 0 sessions active, 0 requeued, 0 remaining in queue.

So finally it works.

Regards
Aleks

[PATCH] MINOR: ssl: support ssl-min-ver and ssl-max-ver with crt-list

2017-05-12 Thread Emmanuel Hocdet

Hi,

This patch depend of " [Patches] TLS methods configuration reworked ».

Actually it will only work with BoringSSL because haproxy use a special 
ssl_sock_switchctx_cbk
with a BoringSSL callback to select certificat before any handshake negotiation.
This feature (and others depend of this ssl_sock_switchctx_cbk) could work with 
openssl 1.1.1 and
the new callback 
https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_early_cb.html.

++
Manu



0001-MINOR-ssl-support-ssl-min-ver-and-ssl-max-ver-with-c.patch
Description: Binary data

Re: [Patches] TLS methods configuration reworked

2017-05-12 Thread Willy Tarreau

Hi guys,

On Tue, May 09, 2017 at 11:21:36AM +0200, Emeric Brun wrote:
> It seems to do what we want, so we can merge it.

So the good news is that this patch set now got merged :-)

Thanks for your time and efforts back-and-forth on this one!
Willy

Re: [PATCH v3] MINOR: ssl: add prefer-client-ciphers

2017-05-12 Thread Willy Tarreau

On Thu, May 04, 2017 at 03:45:40PM +, Lukas Tribus wrote:
> Currently we unconditionally set SSL_OP_CIPHER_SERVER_PREFERENCE [1],
> which may not always be a good thing.
(...)

Now merged, thank you Lukas!

Willy

Re: [PATCH]: CLEANUP/MINOR: retire obsoleted USE_GETSOCKNAME build option

2017-05-12 Thread Willy Tarreau

On Thu, May 11, 2017 at 01:04:50PM +0300, Dmitry Sivachenko wrote:
> Hello,
> 
> this is a patch to nuke obsoleted USE_GETSOCKNAME build option.

Applied, thanks Dmitry. BTW, your attached patch was strangely missing a
header so I rewrote the commit message since this one was not too hard to
guess.

Willy

Re: Bug: DNS changes in 1.7.3+ break UNIX socket stats in daemon mode with resolvers on FreeBSD

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 10:20:56AM +0200, Frederic Lecaille wrote:
> Here is a more well-formed patch.
> Feel free to amend the commit message if not enough clear ;)

It was clear enough, thanks. I added the mention of the faulty commit,
that helps tracking backports and credited Jim and Lukas for the
investigations.

Thanks,
Willy

Re: Quick (hopefully) question about clearing stick table entry

2017-05-12 Thread Willy Tarreau

Hi Franks,

On Wed, May 10, 2017 at 10:29:08AM +, Franks Andy (IT Technical 
Architecture Manager) wrote:
> Hi all,
>   Is there a way to clear a stick table entry (using socat obviously) by
>   referring to the individual 'reference' id given at the beginning of the
>   entry, e.g. "0x7faef417d3ec" ?
> Looking at the manual it seems the clearing function is based on key (ip in
> my case) or data field - server id etc. I could use the key, but I'm not sure
> this will always be individual - I may not always use "stick on src".
> Maybe I'm confused :) and IP key IS the best.

It's not possible to kill by reference like this. However it would be a
bad idea since the entry could be purged and reassigned while you're
doing it, resulting in your operation to kill the wrong one. Killing by
key remains better as it provides a form of atomicity in the operation.

willy

Re: Failed to compile haproxy with lua on Solaris 10

2017-05-12 Thread Willy Tarreau

Hi Benoît,

On Thu, May 04, 2017 at 08:50:33AM +0200, Benoît GARNIER wrote:
(...)
> If you do the following operation : time_t => localtime() => struct tm
> => timegm() => time_t, your result will be shift by the timezone time
> offset (but without any DST applied).
> 
> Technically, if you live in Great Britain, the operation will succeed
> during winter (but will offset the result by 1 hour during summer, since
> DST is applied here).

So in short you're saying that we should always use mktime() instead ?

Willy

Re: [PATCH] Add b64dec sample converter

2017-05-12 Thread Willy Tarreau

Hi Holger,

On Sat, May 06, 2017 at 02:08:29AM +0200, Holger Just wrote:
> This patch against current master adds a new b64dec converter. It takes
> a base64 encoded string and returns its decoded binary representation.
> 
> This converter can be used to e.g. extract the username of a basic auth
> header to add it to the log:
> 
> acl BASIC_AUTH hdr_beg(Authorization) "Basic "
> http-request capture hdr(Authorization),regsub(^Basic\ ,),b64dec if
> BASIC_AUTH

It's so obvious it doesn't even need a justification indeed! I even
thought we already had it!

> I'm open for suggestions for a better name for the converter.
> base64_decode might work but doesn't suit the code formatting well and
> is pretty long...

I didn't find a better one either.

> As a note to reviewers: please be aware that I'm not a C programmer at
> all and I am way outside of my comfort zone here. As such, this function
> might have unhandled edge-cases. I tried to model it according to the
> existing base64 converter and my understanding of how the converters are
> supposed to work but might have missed something.

Thanks for the warning, much appreciated. It made me re-read it after
applying it. But your code is fine, no problem detected! So you're
becoming a C programmer ;-)

> Once verified, I think this converter can be safely added to the
> supported stable versions of HAProxy.

Yes I think it can make sense to backport it at least to 1.7, it can
help sometimes.

Thanks!
Willy

Re: Passing SNI value ( ssl_fc_sni ) to backend's verifyhost.

2017-05-12 Thread Willy Tarreau

On Tue, May 09, 2017 at 12:12:42AM +0200, Lukas Tribus wrote:
> Haproxy can verify the certificate of backend TLS servers since day 1.
> 
> The only thing missing is client SNI based backend certificate
> verification, which yes - since we can pass client SNI to the TLS server
> - we need to consider for the certificate verification process as well.

In fact the cert name is checked, it's just that it can only check against
a constant in the configuration. I agree that it's a problem when using SNI.
Furthermore it forces one to completely disable verifyhost in case SNI is
used.

I tend to think that the best approach would be to always enable it when
SNI is involved in fact, because if SNI is used to the server, it really
means we want to check what cert is provided. This could then possibly be
explicitly turned off by the "verify none" directive.

I have absolutely no idea how to do that however, I don't know if we can
retrieve the previously configured SNI using openssl's API after the
connection is established.

Willy

Re: [RFC][PATCHES] seamless reload

2017-05-12 Thread Willy Tarreau

Hi Pavlos, Olivier,

On Mon, May 08, 2017 at 02:34:05PM +0200, Olivier Houchard wrote:
> Hi Pavlos,
> 
> On Sun, May 07, 2017 at 12:05:28AM +0200, Pavlos Parissis wrote:
> [...]
> > Ignore ignore what I wrote, I am an idiot I am an idiot as I forgot the most
> > important bit of the test, to enable the seamless reload by suppling the
> > HAPROXY_STATS_SOCKET environment variable:-(
> > 
> > I added to the systemd overwrite file:
> > [Service]
> > Environment=CONFIG="/etc/lb_engine/haproxy.cfg"
> > "HAPROXY_STATS_SOCKET=/run/lb_engine/process-1.sock"
> > 
> > and wrk2 reports ZERO errors where with HAPEE reports ~49.
> > 
> > I am terrible sorry for this stupid mistake.
> > 
> > But, this mistake revealed something interesting. The fact that with the 
> > latest
> > code we have more errors during reload.
> > 
> > @Olivier, great work dude. I am waiting for this to be back-ported to 
> > HAPEE-1.7r1.
> > 
> > Once again I am sorry for my mistake,
> > Pavlos
> > 
> 
> Thanks a lot for testing !
> This is interesting indeed. My patch may make it worse when not passing
> fds via the unix socket, as all processes now keep all sockets opened, even
> the one they're not using, maybe it make the window between the last
> accept and the close bigger.

That's very interesting indeed. In fact it's the window between the last
accept and the *last* close, due to processes holding the socket while
not being willing to do accept anything on it.

> If that is so, then the global option "no-unused-socket" should provide
> a comparable error rate.

In fact William is currently working on the master-worker model to get rid
of the systemd-wrapper and found some corner cases between this and your
patchset. Nothing particularly difficult, just the fact that he'll need
to pass the path to the previous socket to the new processes during reloads.

During this investigation it was found that we'd need to be able to say
that a process possibly has no stats socket and that the next one will not
be able to retrieve the FDs. Such information cannot be passed from the
command line since it's a consequence of the config parsing. Thus we thought
it would make sense to have a per-socket option to say whether or not it
would be usable for offering the listening file descriptors, just like we
currently have an administrative level on them (I even seem to remember
that Olivier first asked if we wouldn't need to do this). And suddenly a
few benefits appear when doing this :
  - security freaks not willing to expose FDs over the socket would simply
not enable them ;

  - we could restrict the number of processes susceptible of exposing the
FDs simply by playing with the "process" directive on the socket ; that
could also save some system-wide FDs ;

  - the master process could reliably find the socket's path in the conf
(the first one with this new directive enabled), even if it's changed
between reloads ;

  - in the default case (no specific option) we wouldn't change the existing
behaviour so it would not make existing reloads worse.

Pavlos, regarding the backport to your beloved version, that's planned, but
as you can see, while the main technical issues have already been sorted out,
there will still be a few small integration-specific changes to come, which
is why for now it's still on hold until all these details are sorted out
once for all.

Best regards,
Willy

[PATCH] Lua medium bugfix

2017-05-12 Thread Thierry Fournier

Hi,

A patch fixing a medium bugfix in attachment.
The backport in 1.6 and 1.7 is easy: it doesn't generate conflicts.

   In the case of a Lua sample-fetch or converter doesn't return any
   value, an acces outside the Lua stack can be performed. This patch
   check the stack size before converting the top value to a HAProxy
   internal sample.

   A workaround consist to check that a value value is always returned
   with sample fetches and converters.

   This patch should be backported in the version 1.6 and 1.7


Thierry
>From cad53b6e6e2a35202f8086d3239dc2f8891d8944 Mon Sep 17 00:00:00 2001
From: Thierry FOURNIER 
Date: Fri, 12 May 2017 16:32:20 +0200
Subject: [PATCH] BUG/MEDIUM: lua: segfault if a converter or a sample doesn't
 return anything

In the case of a Lua sample-fetch or converter doesn't return any
value, an acces outside the Lua stack can be performed. This patch
check the stack size before converting the top value to a HAProxy
internal sample.

A workaround consist to check that a value value is always returned
with sample fetches and converters.

This patch should be backported in the version 1.6 and 1.7
---
 src/hlua.c |8 
 1 file changed, 8 insertions(+)

diff --git a/src/hlua.c b/src/hlua.c
index 643d3fc..b8d2c88 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -5496,6 +5496,10 @@ static int hlua_sample_conv_wrapper(const struct arg *arg_p, struct sample *smp,
 	switch (hlua_ctx_resume(stream->hlua, 0)) {
 	/* finished. */
 	case HLUA_E_OK:
+		/* If the stack is empty, the function fails. */
+		if (lua_gettop(stream->hlua->T) <= 0)
+			return 0;
+
 		/* Convert the returned value in sample. */
 		hlua_lua2smp(stream->hlua->T, -1, smp);
 		lua_pop(stream->hlua->T, 1);
@@ -5617,6 +5621,10 @@ static int hlua_sample_fetch_wrapper(const struct arg *arg_p, struct sample *smp
 			stream_int_retnclose(&stream->si[0], &msg);
 			return 0;
 		}
+		/* If the stack is empty, the function fails. */
+		if (lua_gettop(stream->hlua->T) <= 0)
+			return 0;
+
 		/* Convert the returned value in sample. */
 		hlua_lua2smp(stream->hlua->T, -1, smp);
 		lua_pop(stream->hlua->T, 1);
-- 
1.7.10.4

Re: Automatic Certificate Switching Idea

2017-05-12 Thread Willy Tarreau

Hi,

On Tue, May 09, 2017 at 07:04:01PM +0200, Daniel Schneller wrote:
> Hi!
> 
> > On 9. May. 2017, at 00:30, Lukas Tribus  wrote:
> > 
> > [...]
> > I'm opposed to heavy feature-bloating for provisioning use-cases, that
> > can quite easily fixed where the fix belongs - the provisioning layer.
> 
> You are right, that this can be handled outside / in the provisioning layer.
> And I have no problem implementing it there, if it is considered too narrow a
> niche feature. However, I was curious to see, if this is something that other
> people also need constantly -- sometimes you believe you are in a specific
> bubble, but aren't. But from the amount of feedback the original post
> generated, I think I know my anser already ;-)

In fact I'm less opposed than Lukas here given that I have no idea of the
possible impacts nor complexity (but I don't want to have the complete MS
Office suite merged in, just Word, Excel and PowerPoint :-)).

I'd tend to say that since we're progressively evolving in a more dynamic
world where users want the ability to perform *some* updates without
reloading, the day we realize that 90% of the haproxy reloads are only
caused by cert updates, we need to think about a way to address this. I
remember that Thierry started to look at how to feed a cert from the CLI
but apparently it was everything but obvious.

Loading multiple certs could be nice in theory, but there are a few
shortcomings to keep in mind :
  - for embedded users you don't want haproxy's date check to become
strict because it's frequent that such devices have a totally
wrong date. Or at least you want to ensure that you always keep
the most recent cert and never kill any outdated one.

  - renewed certs can and will sometimes provide extra alt names, so
they are not always 100% equivalent.

  - renewed certs will also change the key size once in a while, and
sometimes the algorithm. Technically speaking it might cause
difficulties to change this on the fly, or at least some
verifications have to be performed at load time.

  - I think that most of the crt-list config is per-certificate file
and not per-name. That might also make certain things more
complicated to configure

That said, given that we can already look up a cert based on a name,
maybe in fact we could load all of them and just try to find a more
recent one if the first one reported by the SNI is outdated. I don't
know if that solves everything there.

In any case, this will not provide any benefit regarding let's encrypt
or such solutions, because the next cert would have to be known in
advance and loaded already, so reloads will have to be performed to
take it into account. So I think that the approach making it possible
to feed them over the CLI would still be mor interesting (and possibly
complementary).

It could be interesting to study what it would require to implement
a "strict-date" option or something like this per certificate to
enable checking of their validity during the pick-up. Still, one
point has to be kept in mind. Daniel I'm pretty sure that most users
would prefer the approach consisting in picking the most recent
valid cert instead of the last one as you'd like. I don't really
know if it's common to issue a cert with a "not-before" date in the
future. And that might be the whole point in the end.

Hoping this helps,
Willy

Re: [PATCH] Lua medium bugfix

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 04:41:48PM +0200, Thierry Fournier wrote:
> Hi,
> 
> A patch fixing a medium bugfix in attachment.
> The backport in 1.6 and 1.7 is easy: it doesn't generate conflicts.
> 
>In the case of a Lua sample-fetch or converter doesn't return any
>value, an acces outside the Lua stack can be performed. This patch
>check the stack size before converting the top value to a HAProxy
>internal sample.
(...)

Applied, thank you Thierry.

Willy

Re: Limiting bandwidth of connections

2017-05-12 Thread Willy Tarreau

Hi Robin,

On Wed, May 10, 2017 at 09:15:44PM +, Robin H. Johnson wrote:
> Hi,
> 
> I'm wondering about the status of bandwidth limiting that was originally
> planned for 1.6.
> 
> In the archives I see discussions in 2012 & 2013; Willy's responses:
> 2012-04-17 planned for 1.6:
> https://www.mail-archive.com/haproxy@formilux.org/msg07096.html
> 2013-05-01 planned for 1.6:
> https://www.mail-archive.com/haproxy@formilux.org/msg09812.html

Several of us would like to get it done, and with filters it should be
easy and even fun. The most difficult I guess is to define the different
limiting classes we want so that we don't change the configuration every
other day. What is currently sure has already been requested :
  - per connection limits (ie: no user can take more than X MBps)
  - per frontend limits (ie: no hosted service can take more than X Mbps)
  - per backend limits  (ie: no hosted customer of a virtual server can take 
more than X)
  - per process limits (ie: limit total outgoing bandwidth)
  - per track-sc limit (ie: track an entry in a table and the bandwidth
there is shared) so that more complex criteria can be set

The 1st and 4th ones have been the most demanded. The first one to limit
hungry users and save bandwidth (eg: don't let them preload too much data
that they're going to drop when clicking stop on a sound or video player).
The fourth one to avoid network drops when the external bandwidth is itself
capped on network equipments.

You may want to take a stab at this, the filters API is well documented
and that may give you some hints about some points we have to think about.
We definitely need to make progress on this stuff that has been promised
since 1.4.x but never considered urgent.

Willy

Re: [PATCH] Add b64dec sample converter

2017-05-12 Thread Holger Just

Hi Willy,

thanks for applying the patch!

Willy Tarreau wrote:
> Thanks for the warning, much appreciated. It made me re-read it after
> applying it. But your code is fine, no problem detected! So you're
> becoming a C programmer ;-)

Yeah, we will see about that :)

>> Once verified, I think this converter can be safely added to the
>> supported stable versions of HAProxy.
> 
> Yes I think it can make sense to backport it at least to 1.7, it can
> help sometimes.

That would be much appreciated. I think a backport even down to 1.6 is
pretty risk-free given that the structure there hasn't changed much
lately and the patch applies cleanly even on 1.6.0.

Cheers,
Holger

Re: [PATCH] Add b64dec sample converter

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 05:39:28PM +0200, Holger Just wrote:
> >> Once verified, I think this converter can be safely added to the
> >> supported stable versions of HAProxy.
> > 
> > Yes I think it can make sense to backport it at least to 1.7, it can
> > help sometimes.
> 
> That would be much appreciated. I think a backport even down to 1.6 is
> pretty risk-free given that the structure there hasn't changed much
> lately and the patch applies cleanly even on 1.6.0.

I have no fear about it being backported to 1.6, the thing is that we
normally don't backport any feature anymore to stable branches due to
the terrible experience in 1.4 where too much riskless stuff was
backported, then fixed, then removed etc... making each subsequent
version a pain for certain users. In practice we tend to be a bit
flexible and to backport very small stuff that makes people life's
easier or the whole process more reliable (eg: config warnings,
ability to quit after a delay on reload), but clearly I don't want
to do it past the last release. The reason is simple : if some users
are still on 1.6 instead of 1.7, it's precisely because they don't
want to get a single change unless it's a real bug. There are some
places where changelogs and patches are all read one by one (and
non-reg tests run for some time) before deciding to upgrade.

And this gives incentive to users of older releases to start to
consider new ones :-)

Cheers,
Willy

Re: [PATCH] Add b64dec sample converter

2017-05-12 Thread Holger Just

Hi Willy,

Willy Tarreau wrote:
> The thing is that we normally don't backport any feature anymore to
> stable branches due to the terrible experience in 1.4 where too much
> riskless stuff was backported, then fixed, then removed etc... making
> each subsequent version a pain for certain users.
> 
> [...]
> 
> And this gives incentive to users of older releases to start to 
> consider new ones :-)

Those are all very good reasons for not backporting the patch. I hadn't
considered that an exception is usually not important enough to break
the default of having rock solid stable versions.

That is indeed a very good hard rule to have in a software maintainer's
handbook. Thanks for taking the time to explain your reasoning and also
for saying no.

Cheers,
Holger

Re: Automatic Certificate Switching Idea

2017-05-12 Thread Daniel Schneller

Willy,

thanks for your elaborate reply! See my remarks below.

> possible impacts nor complexity (but I don't want to have the complete MS
> Office suite merged in, just Word, Excel and PowerPoint :-)).


:-D

>  - renewed certs can and will sometimes provide extra alt names, so
>they are not always 100% equivalent.
> […]

> That said, given that we can already look up a cert based on a name,
> maybe in fact we could load all of them and just try to find a more
> recent one if the first one reported by the SNI is outdated. I don't
> know if that solves everything there.


It actually might. In the end it would be something like a map, with the
key being the domain, and the value a list of pointers to the actual
certificates, sorted by remaining validity, having shortest first.

> In any case, this will not provide any benefit regarding let's encrypt
> or such solutions, because the next cert would have to be known in
> advance and loaded already, so reloads will have to be performed to
> take it into account. So I think that the approach making it possible
> to feed them over the CLI would still be mor interesting (and possibly
> complementary).

I think it would benefit Let’s Encrypt and similar scenarios. I would
still require reloads to pick up newly added certificates. But as renewed
certificates overlap their predecessors’ validity period, dropping them
into a directory and just doing a reload maybe once a day would work.
Clients would still get the older one, until it finally expired, but that
should not matter, as we are not talking about revocations where
switching to a new cert is wanted quickly.

> Daniel I'm pretty sure that most users
> would prefer the approach consisting in picking the most recent
> valid cert instead of the last one as you'd like. I don't really
> know if it's common to issue a cert with a "not-before" date in the
> future. And that might be the whole point in the end.


Well, I was just thinking about the not-after date. In general, from a
client perspective it shouldn’t matter to get an older one, until it
really expires. And the case where you have a new certificate
already, and you want it handed out to clients ASAP is already taken
care of today — just replace the file and reload :-)
Unless I misunderstood what you meant when referring to the
“not-before” date.

Daniel

PS: This is an interesting discussion, and I am happy to continue
it, if anyone feels the same. As I said, I will try to solve this via
provisioning scripts in the meantime, so there is no time pressure.


-- 
Daniel Schneller
Principal Cloud Engineer
 
CenterDevice GmbH  | Hochstraße 11
   | 42697 Solingen
tel: +49 1754155711| Deutschland
daniel.schnel...@centerdevice.de   | www.centerdevice.de

Geschäftsführung: Dr. Patrick Peschlow, Dr. Lukas Pustina,
Michael Rosbach, Handelsregister-Nr.: HRB 18655,
HR-Gericht: Bonn, USt-IdNr.: DE-815299431

hostname to IP converter possible?

2017-05-12 Thread Igor Pav

Hi list,

Is now there's a converter for hostname to IPv4 available in haproxy?

Regards,
Igor

Re: Automatic Certificate Switching Idea

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 06:42:20PM +0200, Daniel Schneller wrote:
> > That said, given that we can already look up a cert based on a name,
> > maybe in fact we could load all of them and just try to find a more
> > recent one if the first one reported by the SNI is outdated. I don't
> > know if that solves everything there.
> 
> 
> It actually might. In the end it would be something like a map, with the
> key being the domain, and the value a list of pointers to the actual
> certificates, sorted by remaining validity, having shortest first.

That's already what is done in the SNI trees, except that the validity
date is not considered, the first one matching is retrieved.

> I think it would benefit Let's Encrypt and similar scenarios. I would
> still require reloads to pick up newly added certificates. But as renewed
> certificates overlap their predecessors' validity period, dropping them
> into a directory and just doing a reload maybe once a day would work.
> Clients would still get the older one, until it finally expired, but that
> should not matter, as we are not talking about revocations where
> switching to a new cert is wanted quickly.

Using the old one "until it expires" is what really causes me a problem
(and I understand that in your case that's what you need). There are
several reasons for prefering the latest one instead :
  - it might provide stronger algorithms

  - it might use a CA which is not being blacklisted (remember that people
started to complain about haproxy.org causing them some warnings because
the CA was considered unsafe)

  - it was issued in the past (minutes, hours, days) so is likely already
valid regardless of any small time shift. Using the old one one minute
past its validity date will be a big problem.

  - the change will be effective at the moment of reload, meaning that any
surprize like an incomplete chain, incorrect OCSP, key size incompatible
with certain browsers, will be identified at an expected moment and when
it's not too late to fix it. By using the oldest one as long as possible,
it would break at any time in the middle of the night and would do it
once you cannot roll back.

And that's the point. Users praise haproxy's reliability but in fact it's
not (just) the code's reliability (git log --grep BUG shames us), but the
fact that it has always been designed to be used by humans, who make
mistakes and who want to spot them very quickly and to fix them before
they become a big trouble. Config warnings/errors, checks for suspicious
constructs and logs are directly involved here. And we do know that our
users occasionally fail and we must help them recover, and even possibly
cover their mistakes before the boss or the customer has any chance to
notice.

So creating something designed to fail by default in their back without
prior notice and without the ability to quickly stop before anyone notices
is contrary to the philosophy here.

That doesn't mean that what you need must not be implemented, it means
that under no circumstance it should be the default nor happen to be
enabled by default. Thus I think that at minima if we ever go in that
direction, the default behaviour must be the expected one (ie: use the
most recent valid cert), and maybe there could be an option to prefer
the old one instead and to apply a date margin (eg: avoid using this
one if there's less than a day left).

(...)
> PS: This is an interesting discussion, and I am happy to continue
> it, if anyone feels the same.

I would not be surprized if we get some followups in either direction.
Over the mid term, more and more people will be affected by related
situations and the whole aspect of cert renewal will eventually become
hot. But I strongly doubt we'll do anything for this in 1.8, though
collecting views, ideas and constraints can be useful to try to serve
everyone the best later.

> As I said, I will try to solve this via
> provisioning scripts in the meantime, so there is no time pressure.

That's perfect! Your feedback and possible trouble in doing this will
also definitely help!

thanks,
Willy

Re: haproxy not creating stick-table entries fast enough

2017-05-12 Thread redundantl y

On Fri, May 12, 2017 at 12:51 AM, Willy Tarreau  wrote:

> On Tue, May 09, 2017 at 09:43:22PM -0700, redundantl y wrote:
> > For example, I have tried with the latest versions of Firefox, Safari,
> and
> > Chrome.  With 30 elements on the page being loaded from the server
> they're
> > all being loaded within 70ms of each other, the first 5 or so happening
> on
> > the same millisecond.  I'm seeing similar behaviour, being sent to
> > alternating backend servers until it "settles" and sticks to just one.
>
> That's only true after the browser starts to retrieve the main page which
> gives it the indication that it needs to request such objects. You *always*
> have a first request before all other ones. The browser cannot guess it
> will have to retrieve many objects out of nowhere.
>
>
As I've said before, the issue here is these objects aren't hosted on the
same server that they're being called from.

"A separately hosted application will generate HTML with several (20-30)
elements that will be loaded simultaneously by the end user's browser."

So a user might go to www.example.com and that page will load the objects
from assets.example.com, which is a wholly separate server.


> The principle of stickiness is to ensure that subsequent requests will go
> to the same server that served the previous ones. The main goal is to
> ensure that all requests carrying a session cookie will end up on the
> server which holds this session.
>
> Here as Lukas explained, you're simulating a browser sending many totally
> independant requests in parallel. There's no reason (nor any way) that
> any equipment in the chain would guess they are related since they could
> arrive in any order, and even end up on multiple nodes.
>
>
Well, all of these requests will have the url_param email=, so the load
balancer has the ability to know they are related.  The issue here, at
least how it appears to me, is since they come in so fast the stick-table
entry doesn't get generated quickly enough and the requests get distributed
to multiple backend servers and eventually stick to just one.


> If despite this that's what you need (for a very obscure reason), then
> you'd rather use hashing for this. It will ensure that the same
> distribution
> algorithm is applied to all these requests regardless of their ordering.
> But
> let me tell you that it still makes me feel like you're trying to address
> the wrong problem.
>
>
Since changing to load balancing on the url_param our issue has been
resolved.


> Also, most people prefer not to apply stickiness for static objects so that
> they can be retrieved in parallel from all static servers instead of all
> hammering the same server. It might possibly not be your case based on your
> explanation, but this is what people usually do for a better user
> experience.
>
>
The objects aren't static.  When they're loaded the application makes some
calls to external services (3rd party application, database server) to
produce the desired objects and links.


> In conclusion, your expected use case still seem quite obscure to me :-/
>
> Willy
>

I agree, our use case is fairly unique.

Re: haproxy not creating stick-table entries fast enough

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 10:20:02AM -0700, redundantl y wrote:
> As I've said before, the issue here is these objects aren't hosted on the
> same server that they're being called from.
> 
> "A separately hosted application will generate HTML with several (20-30)
> elements that will be loaded simultaneously by the end user's browser."
> 
> So a user might go to www.example.com and that page will load the objects
> from assets.example.com, which is a wholly separate server.

OK but *normally* if there's parallelism when downloading objects from
assets.example.com, then there's no dependency between them.

> > The principle of stickiness is to ensure that subsequent requests will go
> > to the same server that served the previous ones. The main goal is to
> > ensure that all requests carrying a session cookie will end up on the
> > server which holds this session.
> >
> > Here as Lukas explained, you're simulating a browser sending many totally
> > independant requests in parallel. There's no reason (nor any way) that
> > any equipment in the chain would guess they are related since they could
> > arrive in any order, and even end up on multiple nodes.
> >
> >
> Well, all of these requests will have the url_param email=, so the load
> balancer has the ability to know they are related.  The issue here, at
> least how it appears to me, is since they come in so fast the stick-table
> entry doesn't get generated quickly enough and the requests get distributed
> to multiple backend servers and eventually stick to just one.

It's not fast WRT the stick table but WRT the time to connect to the server.
As I mentionned, the principle of stickiness is to send subsequent requests
to the same server which *served* the previous ones. So if the first request
is sent to server 1, the connection fails several times, then it's redispathed
to server 2 and succeeds, it will be server 2 which will be put into the table
so that next connections will go there as well.

In your workload, there isn't even the time to validate the connection to the
server, and *this* is what causes the problem you're seeing.

> Since changing to load balancing on the url_param our issue has been
> resolved.

So indeed you're facing the type of workloads requiring a hash.

> > Also, most people prefer not to apply stickiness for static objects so that
> > they can be retrieved in parallel from all static servers instead of all
> > hammering the same server. It might possibly not be your case based on your
> > explanation, but this is what people usually do for a better user
> > experience.
> >
> >
> The objects aren't static.  When they're loaded the application makes some
> calls to external services (3rd party application, database server) to
> produce the desired objects and links.

OK I see. Then better stick to the hash using url_param. You can improve
this by combining it with stick anyway if your url_params are frequently
reused (eg: many requests per client). This will avoid redistributing
innocent connections in the event a server is added or removed due to
the hash being recomputed. That can be especially true of your 3rd party
application sometimes has long response times and the probability of a
server outage between the first and the last request for a client becomes
high.

> > In conclusion, your expected use case still seem quite obscure to me :-/
> >
> > Willy
> >
> 
> I agree, our use case is fairly unique.

It looks so :-)

Willy

Re: haproxy not creating stick-table entries fast enough

2017-05-12 Thread redundantl y

On Fri, May 12, 2017 at 10:46 AM, Willy Tarreau  wrote:

> On Fri, May 12, 2017 at 10:20:02AM -0700, redundantl y wrote:
> > As I've said before, the issue here is these objects aren't hosted on the
> > same server that they're being called from.
> >
> > "A separately hosted application will generate HTML with several (20-30)
> > elements that will be loaded simultaneously by the end user's browser."
> >
> > So a user might go to www.example.com and that page will load the
> objects
> > from assets.example.com, which is a wholly separate server.
>
> OK but *normally* if there's parallelism when downloading objects from
> assets.example.com, then there's no dependency between them.
>
> > > The principle of stickiness is to ensure that subsequent requests will
> go
> > > to the same server that served the previous ones. The main goal is to
> > > ensure that all requests carrying a session cookie will end up on the
> > > server which holds this session.
> > >
> > > Here as Lukas explained, you're simulating a browser sending many
> totally
> > > independant requests in parallel. There's no reason (nor any way) that
> > > any equipment in the chain would guess they are related since they
> could
> > > arrive in any order, and even end up on multiple nodes.
> > >
> > >
> > Well, all of these requests will have the url_param email=, so the load
> > balancer has the ability to know they are related.  The issue here, at
> > least how it appears to me, is since they come in so fast the stick-table
> > entry doesn't get generated quickly enough and the requests get
> distributed
> > to multiple backend servers and eventually stick to just one.
>
> It's not fast WRT the stick table but WRT the time to connect to the
> server.
> As I mentionned, the principle of stickiness is to send subsequent requests
> to the same server which *served* the previous ones. So if the first
> request
> is sent to server 1, the connection fails several times, then it's
> redispathed
> to server 2 and succeeds, it will be server 2 which will be put into the
> table
> so that next connections will go there as well.
>
> In your workload, there isn't even the time to validate the connection to
> the
> server, and *this* is what causes the problem you're seeing.
>
>
Thank you for explaining what I'm seeing. This makes a lot of sense.


> > Since changing to load balancing on the url_param our issue has been
> > resolved.
>
> So indeed you're facing the type of workloads requiring a hash.
>
> > > Also, most people prefer not to apply stickiness for static objects so
> that
> > > they can be retrieved in parallel from all static servers instead of
> all
> > > hammering the same server. It might possibly not be your case based on
> your
> > > explanation, but this is what people usually do for a better user
> > > experience.
> > >
> > >
> > The objects aren't static.  When they're loaded the application makes
> some
> > calls to external services (3rd party application, database server) to
> > produce the desired objects and links.
>
> OK I see. Then better stick to the hash using url_param. You can improve
> this by combining it with stick anyway if your url_params are frequently
> reused (eg: many requests per client). This will avoid redistributing
> innocent connections in the event a server is added or removed due to
> the hash being recomputed. That can be especially true of your 3rd party
> application sometimes has long response times and the probability of a
> server outage between the first and the last request for a client becomes
> high.
>
>
Thank you for pointing this out, we hadn't considered this scenario.


> > > In conclusion, your expected use case still seem quite obscure to me
> :-/
> > >
> > > Willy
> > >
> >
> > I agree, our use case is fairly unique.
>
> It looks so :-)
>
> Willy
>

Thanks for taking the time to read and respond.  It was very informative
and helpful.

Re: haproxy

2017-05-12 Thread Bryan Talbot

> On May 11, 2017, at May 11, 7:51 AM, Jose Alarcon  
> wrote:
> 
> Hello,
> 
> excuseme my english is very bad, i need know how change configuration haproxy 
> pasive/active manually not using keepalived.
> 

There is no standard way because that is not a feature of haproxy. High 
availability of the proxy is managed by an external tool like keepalived.

-Bryan

> i need this information for a highscholhomework.
> 
> thanks.
> 
> my native lenguaje is spanish.-

Secondary load balancing method (fallback)

2017-05-12 Thread redundantl y

Is it possible to configure a secondary load balancing method, something to
fall back on if the first method isn't met?

For example, if I balance on the url_param email:

balance url_param email

Can it instead balance on another url_param:

balance url_param id

Or have it balance based on source address?

I tried setting the following:

balance url_param email
balance url_param id

But it only balanced on the second one, id.

I haven't found anything saying this is possible, but I'd just like to make
sure it isn't.

Thanks.

Re: OpenSSL engine and async support

2017-05-12 Thread Grant Zhang

> On May 10, 2017, at 04:51, Emeric Brun  wrote:
> 
>> It looks like the main process stalls at DH_free(local_dh_1024) (part of 
>> __ssl_sock_deinit). Not sure why but I will debug and report back.
>> 
>> Thanks,
> 
> I experienced the same issue (stalled on a futex) if i run haproxy in 
> foreground and trying to kill it with kill -USR1.
> 
> With this conf (dh param and ssl-async are disabled)
> global
> #   tune.ssl.default-dh-param 2048
>ssl-engine qat
> #   ssl-async
>nbproc 1

It looks like that the stall on futex issue is related to DH_free() calling 
ENGINE_finish in openssl 1.1:
https://github.com/openssl/openssl/blob/master/crypto/dh/dh_lib.c#L109
(gdb) bt
#0  __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
#1  0x7fa1582c5571 in pthread_rwlock_wrlock ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:118
#2  0x7fa158a58559 in CRYPTO_THREAD_write_lock () from 
/tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
#3  0x7fa1589d8800 in ENGINE_finish () from 
/tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
#4  0x7fa158975e76 in DH_free () from 
/tmp/openssl_1.1.0_install/lib/libcrypto.so.1.1
#5  0x00417c78 in free_dh () at src/ssl_sock.c:7905
#6  0x7fa1591d58ce in _dl_fini () at dl-fini.c:254
#7  0x7fa158512511 in __run_exit_handlers (status=0, listp=0x7fa15888e688, 
run_list_atexit=true)
at exit.c:78
#8  0x7fa158512595 in __GI_exit (status=) at exit.c:100
#9  0x00408814 in main (argc=4, argv=0x7ffe72188548) at 
src/haproxy.c:2235

Openssl 1.1 has changed the way ENGINE_cleanup works:
https://www.openssl.org/docs/man1.1.0/crypto/ENGINE_cleanup.html
"From OpenSSL 1.1.0 it is no longer necessary to explicitly call ENGINE_cleanup 
and this function is deprecated. Cleanup automatically takes place at program 
exit."

I suppose by the time the destructor __ssl_sock_deinit is called, 
engine-related cleanup are already done by openssl and ENGINE_finish (from 
DH_free) stalls on a non-existing write lock.

I have a workaround which moves the DH_free logic out of the destructor 
__ssl_sock_deinit, and right before process exit.  With the workaround I no 
longer see the stall issue. I am not sure whether it is optimal solution 
though. Let me know.

Thanks,

Grant

Re: Secondary load balancing method (fallback)

2017-05-12 Thread Willy Tarreau

On Fri, May 12, 2017 at 02:00:24PM -0700, redundantl y wrote:
> Is it possible to configure a secondary load balancing method, something to
> fall back on if the first method isn't met?
> 
> For example, if I balance on the url_param email:
> 
> balance url_param email
> 
> Can it instead balance on another url_param:
> 
> balance url_param id
> 
> Or have it balance based on source address?
> 
> I tried setting the following:
> 
> balance url_param email
> balance url_param id
> 
> But it only balanced on the second one, id.
> 
> I haven't found anything saying this is possible, but I'd just like to make
> sure it isn't.

You can't do this, and there is already a fallback on algorithms involving
hashes, but the fallback is to round robin, as indicated in the doc. What
you can do however is to check in your frontend if you have this parameter
and use a specific backend when it is present, or another one when it is
not present. Eg:

  frontend blah
 use_backend lb-email if { url_param(email) -m found }
 use_backend lb-id if { url_param(id) -m found }
 default_backend lb-src

  backend lb-email
 balance url_param email
 server s1 1.1.1.1 track lb-src/s1
 server s2 1.1.1.2 track lb-src/s2

  backend lb-id
 balance url_param id
 server s1 1.1.1.1 track lb-src/s1
 server s2 1.1.1.2 track lb-src/s2

  backend lb-src
 balance src
 server s1 1.1.1.1 check
 server s2 1.1.1.2 check

Willy

Re: hostname to IP converter possible?

2017-05-12 Thread Willy Tarreau

Hi Igor,

On Sat, May 13, 2017 at 12:58:19AM +0800, Igor Pav wrote:
> Hi list,
> 
> Is now there's a converter for hostname to IPv4 available in haproxy?

Funny that you asked the same question one year ago, but you didn't get
a response, you are patient :-)

Server addresses can be dynamically resolved now, but that's all. Baptiste
is improving the DNS infrastructure so that it depends less on the servers,
and maybe in the future it might become possible to use it for other features
(eg: a converter) but for now it is not.

Regards,
Willy

Re: Failed to compile haproxy with lua on Solaris 10

2017-05-12 Thread Benoît GARNIER

Le 12/05/2017 à 15:54, Willy Tarreau a écrit :
> Hi Benoît,
>
> On Thu, May 04, 2017 at 08:50:33AM +0200, Benoît GARNIER wrote:
> (...)
>> If you do the following operation : time_t => localtime() => struct tm
>> => timegm() => time_t, your result will be shift by the timezone time
>> offset (but without any DST applied).
>>
>> Technically, if you live in Great Britain, the operation will succeed
>> during winter (but will offset the result by 1 hour during summer, since
>> DST is applied here).
> So in short you're saying that we should always use mktime() instead ?
>
> Willy

No, not at all !!! To sum up, these are the basic functions to work with
time :

- time() return a time_t which is timezone agnostic since it's just a
precise moment in time (it represents the same moment for everybody)

- localtime() takes this time_t and build a representation of this time
in the current time zone (struct tm)

- mktime() take a struct tm representing a specific time in the current
timezone et return a time_t

gmtime() and timegm() are the same as localtime() and mktime() but will
ignore the timezone and DST: they only work with UTC time.

So you can use timegm() on a struct tm only if you know that struct tm
represents a GMT time (for example if it was build with gmtime()).

Similarly, using mktime() is only valid if this struct tm represents the
time in the current time zone (i.e. if it was build with localtime()
with the same timezone).

For example if you parse a log file with GMT time in it you'll use
timegm() to build a time_t representing the precise time of the log.

If you parse a file with local time in it, you'll use mktime() but
you'll also have to know what was the timezone used to build it.

1) Time zone agnostic: time()
2) Current time zone: localtime() and mktime()
3) UTC time: gmtime() and timegm()

As a rule of thumb, you cannot mix functions in categories 2 and 3.

Benoît

Re: hostname to IP converter possible?

2017-05-12 Thread Igor Pav

Thanks, Willy. I found DNS infrastructure improved a lot this year, so
I ask it again, hope it is not so stupid :-)

On Sat, May 13, 2017 at 7:19 AM, Willy Tarreau  wrote:
> Hi Igor,
>
> On Sat, May 13, 2017 at 12:58:19AM +0800, Igor Pav wrote:
>> Hi list,
>>
>> Is now there's a converter for hostname to IPv4 available in haproxy?
>
> Funny that you asked the same question one year ago, but you didn't get
> a response, you are patient :-)
>
> Server addresses can be dynamically resolved now, but that's all. Baptiste
> is improving the DNS infrastructure so that it depends less on the servers,
> and maybe in the future it might become possible to use it for other features
> (eg: a converter) but for now it is not.
>
> Regards,
> Willy

Re: Failed to compile haproxy with lua on Solaris 10

2017-05-12 Thread Willy Tarreau

Hi Benoît,

On Sat, May 13, 2017 at 07:32:10AM +0200, Benoît GARNIER wrote:
> Le 12/05/2017 à 15:54, Willy Tarreau a écrit :
> > Hi Benoît,
> >
> > On Thu, May 04, 2017 at 08:50:33AM +0200, Benoît GARNIER wrote:
> > (...)
> >> If you do the following operation : time_t => localtime() => struct tm
> >> => timegm() => time_t, your result will be shift by the timezone time
> >> offset (but without any DST applied).
> >>
> >> Technically, if you live in Great Britain, the operation will succeed
> >> during winter (but will offset the result by 1 hour during summer, since
> >> DST is applied here).
> > So in short you're saying that we should always use mktime() instead ?
> >
> > Willy
> 
> No, not at all !!! To sum up, these are the basic functions to work with
> time :
> 
> - time() return a time_t which is timezone agnostic since it's just a
> precise moment in time (it represents the same moment for everybody)
> 
> - localtime() takes this time_t and build a representation of this time
> in the current time zone (struct tm)
> 
> - mktime() take a struct tm representing a specific time in the current
> timezone et return a time_t
> 
> gmtime() and timegm() are the same as localtime() and mktime() but will
> ignore the timezone and DST: they only work with UTC time.
> 
> So you can use timegm() on a struct tm only if you know that struct tm
> represents a GMT time (for example if it was build with gmtime()).
> 
> Similarly, using mktime() is only valid if this struct tm represents the
> time in the current time zone (i.e. if it was build with localtime()
> with the same timezone).
> 
> For example if you parse a log file with GMT time in it you'll use
> timegm() to build a time_t representing the precise time of the log.
> 
> If you parse a file with local time in it, you'll use mktime() but
> you'll also have to know what was the timezone used to build it.
> 
> 1) Time zone agnostic: time()
> 2) Current time zone: localtime() and mktime()
> 3) UTC time: gmtime() and timegm()
> 
> As a rule of thumb, you cannot mix functions in categories 2 and 3.

OK thank you, that's perfectly clear now and it makes sense. Thierry
told me that he purposely used timegm() because he wanted UTC. Man
pages recommend not to use it because it's obsolete and suggest to
use setenv(TZ)+tzset()+mktime() !!! It's amazing to read something
that stupid in man pages written 10 years after everyone started to
write threaded applications! I found that in practice many people
use a hand-written timegm() function which does all the computation
by hand, just like those of us who have known MS-DOS used to do 30
years ago, so I think we'll have to go down that route for a more
portable implementation :-/  At least this will also save us from
accidently using implementations where timegm() is a wrapper on
setenv+tzset+mktime()...

Thanks for your explanations!

Willy

Re: Failed to compile haproxy with lua on Solaris 10

2017-05-12 Thread Benoît GARNIER

Le 13/05/2017 à 08:09, Willy Tarreau a écrit :
> Hi Benoît,
>
> On Sat, May 13, 2017 at 07:32:10AM +0200, Benoît GARNIER wrote:
>> Le 12/05/2017 à 15:54, Willy Tarreau a écrit :
>>> Hi Benoît,
>>>
>>> On Thu, May 04, 2017 at 08:50:33AM +0200, Benoît GARNIER wrote:
>>> (...)
 If you do the following operation : time_t => localtime() => struct tm
 => timegm() => time_t, your result will be shift by the timezone time
 offset (but without any DST applied).

 Technically, if you live in Great Britain, the operation will succeed
 during winter (but will offset the result by 1 hour during summer, since
 DST is applied here).
>>> So in short you're saying that we should always use mktime() instead ?
>>>
>>> Willy
>> No, not at all !!! To sum up, these are the basic functions to work with
>> time :
>>
>> - time() return a time_t which is timezone agnostic since it's just a
>> precise moment in time (it represents the same moment for everybody)
>>
>> - localtime() takes this time_t and build a representation of this time
>> in the current time zone (struct tm)
>>
>> - mktime() take a struct tm representing a specific time in the current
>> timezone et return a time_t
>>
>> gmtime() and timegm() are the same as localtime() and mktime() but will
>> ignore the timezone and DST: they only work with UTC time.
>>
>> So you can use timegm() on a struct tm only if you know that struct tm
>> represents a GMT time (for example if it was build with gmtime()).
>>
>> Similarly, using mktime() is only valid if this struct tm represents the
>> time in the current time zone (i.e. if it was build with localtime()
>> with the same timezone).
>>
>> For example if you parse a log file with GMT time in it you'll use
>> timegm() to build a time_t representing the precise time of the log.
>>
>> If you parse a file with local time in it, you'll use mktime() but
>> you'll also have to know what was the timezone used to build it.
>>
>> 1) Time zone agnostic: time()
>> 2) Current time zone: localtime() and mktime()
>> 3) UTC time: gmtime() and timegm()
>>
>> As a rule of thumb, you cannot mix functions in categories 2 and 3.
> OK thank you, that's perfectly clear now and it makes sense. Thierry
> told me that he purposely used timegm() because he wanted UTC. Man
> pages recommend not to use it because it's obsolete and suggest to
> use setenv(TZ)+tzset()+mktime() !!! It's amazing to read something
> that stupid in man pages written 10 years after everyone started to
> write threaded applications! I found that in practice many people
> use a hand-written timegm() function which does all the computation
> by hand, just like those of us who have known MS-DOS used to do 30
> years ago, so I think we'll have to go down that route for a more
> portable implementation :-/  At least this will also save us from
> accidently using implementations where timegm() is a wrapper on
> setenv+tzset+mktime()...
Time handling is not easy. I hate to say that, but POSIX and glibc
manage to make it even harder. Especially the timezone global handling
which is not thread-safe as you pinpointed.
Anyway, free conding a simple timegm() is not very hard, since you don't
have to take any timezone into account, only leap years.

But beware that real timegm() (and mktime()) perform some time
normalization on tm_mday (day of the month) or tm_mon (month).
For example, they will happily work with fake dates like "March 31st
2017", "February 59th 2017" or even "1st day of 18th month of 2016" and
will convert them to "April 1st 2017" internally.
It's very handy when you want to compute date in the future or in the
past, you just add/substract values to the corresponding field (day or
month) and let mktime() of timegm() do their magic trick.

Benoît

40 matches

Mail list logo