Re: Health check logging differences between 1.9 and 2.0

2020-10-29 Thread Veiko Kukk

On 2020-10-24 10:32, Willy Tarreau wrote:
That sounds strange, I don't like this. This sounds like an 
uninitialized

variable. Did you observe that the facility used is stable inside a
backend for example, or does it seem to be affected by other activity ?


After investigation, it appears that in case of master-worker model 
(-Ws) and systemd Type=notify, master process is duplicating some worker 
messages, prepends severity string, something that appears to be some 
counter (if form of 302/104225), worker pid and emits those with syslog 
facility 'daemon'.


log /dev/log local0 is configured in global section, no other 'log' 
statement in haproxy config.


haproxy-daemon.log:
2020-10-29T10:36:41.503040+00:00 hostname.tld haproxy[17274]: [WARNING] 
302/103641 (19612) : Health check for server proxy_upstream_ssl/ovh_sbg 
succeeded, reason: Layer7 check passed, code: 200, info: "OK", check 
duration: 85ms, status: 3/3 UP.
2020-10-29T10:42:25.488975+00:00 hostname.tld haproxy[17274]: [WARNING] 
302/104225 (19612) : Stopping proxy proxy_upstream_ssl in 0 ms.
2020-10-29T10:42:25.490817+00:00 hostname.tld haproxy[17274]: [WARNING] 
302/104225 (19612) : Proxy proxy_upstream_ssl stopped (FE: 14 conns, BE: 
14 conns).


haproxy-local0.log:
2020-10-29T10:36:41.502693+00:00 hostname.tld haproxy[19612]: Health 
check for server proxy_upstream_ssl/ovh_sbg succeeded, reason: Layer7 
check passed, code: 200, info: "OK", check duration: 85ms, status: 3/3 
UP.
2020-10-29T10:42:25.485245+00:00 hostname.tld haproxy[17274]: Proxy 
proxy_upstream_ssl started.
2020-10-29T10:42:25.492269+00:00 hostname.tld haproxy[19612]: Stopping 
proxy proxy_upstream_ssl in 0 ms.
2020-10-29T10:42:25.494013+00:00 hostname.tld haproxy[19612]: Proxy 
proxy_upstream_ssl stopped (FE: 14 conns, BE: 14 conns).


I'm pretty sure that duplication of log messages should not happen. Or 
is it indeed intended?


Best regards,
Veiko



Re: Health check logging differences between 1.9 and 2.0

2020-10-28 Thread Veiko Kukk

On 2020-10-28 13:11, Veiko Kukk wrote:

Another difference between 1.9 and 2.0 here is that 2.0 is compiled
with systemd support and executed using -Ws and Type=notify instead of
1.9 -W and Type=forking.


With the exactlty same HAproxy 2.0 (compiled with systemd support), I've 
changed haproxy.service to have identical configuration to 1.9 one:


ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE
Type=forking

Log messages are not duplicated anymore, there are no more priority 30 
messages!


Veiko



Re: Health check logging differences between 1.9 and 2.0

2020-10-28 Thread Veiko Kukk

On 2020-10-26 13:57, Christopher Faulet wrote:

health-check log messages are emitted in the same way in 1.9 and 2.0.
And at first glance, the code responsible to set the syslog priority
is the same too. So there is probably something we missed. Could you
confirm you still have the same issue with the above configuration and
a netcat as syslog server ? If it works as expected, please share your
configuration, not only the global and defaults sections.


I cannot reproduce the issue with netcat, neither with simple 
configuration provided by you or our more complex test server config.


I've reconfigured rsyslog to log raw messages into different files 
suffixed with syslog facility name:


$ grep "proxy_upstream_ssl" haproxy-*
haproxy-daemon.log:<30>Oct 28 12:57:19 haproxy[13420]: [WARNING] 
301/125719 (13424) : Health check for server proxy_upstream_ssl/ovh_sbg 
succeeded, reason: Layer7 check passed, code: 200, info: "OK", check 
duration: 30ms, status: 3/3 UP.
haproxy-daemon.log:<30>Oct 28 12:57:20 haproxy[13420]: [WARNING] 
301/125720 (13424) : Health check for server proxy_upstream_ssl/ovh_bhs 
succeeded, reason: Layer7 check passed, code: 200, info: "OK", check 
duration: 423ms, status: 3/3 UP.
haproxy-local0.log:<133>Oct 28 12:57:19 haproxy[13420]: Proxy 
proxy_upstream_ssl started.
haproxy-local0.log:<133>Oct 28 12:57:19 haproxy[13424]: Health check for 
server proxy_upstream_ssl/ovh_sbg succeeded, reason: Layer7 check 
passed, code: 200, info: "OK", check duration: 30ms, status: 3/3 UP.
haproxy-local0.log:<133>Oct 28 12:57:20 haproxy[13424]: Health check for 
server proxy_upstream_ssl/ovh_bhs succeeded, reason: Layer7 check 
passed, code: 200, info: "OK", check duration: 423ms, status: 3/3 UP.


Meaning my initial assumption about all health check logs being emitted 
differently was wrong.


Strange that log line formats are also different. haproxy-daemon.log is 
emitted by master process running as user root (13420) and it duplicates 
health check log messages from it's subprocess running as user haproxy 
(13424), prepending severity name and subprocess pid. Between those is 
'301/125719' - I don't know what it is.


Another difference between 1.9 and 2.0 here is that 2.0 is compiled with 
systemd support and executed using -Ws and Type=notify instead of 1.9 -W 
and Type=forking.


global
  log /dev/log local0
  daemon
  nbproc 1
  nbthread 2

I wonder if systemd-journald is duplicating messages here.

Best regards,
Veiko



Re: Health check logging differences between 1.9 and 2.0

2020-10-23 Thread Veiko Kukk

On 2020-10-22 10:38, Veiko Kukk wrote:

Indeed, in HAproxy 2.0, 'option log-health-checks' messages are
emitted usig syslog facility 'daemon' and not the facility configured
with global configuration keyword 'log'.
In 1.9, health check logs were emitted as defined by 'log' facility 
value.


I was too early to conclude that health check logging is emitted as 
'daemon'. Sometimes they are also emitted as 'user'.


Veiko



Re: Health check logging differences between 1.9 and 2.0

2020-10-22 Thread Veiko Kukk

On 2020-10-20 11:56, Veiko Kukk wrote:
I've upgraded some servers from 1.9.15 to 2.0.18. Log config is very 
simple.

...

Without any changes to rsyslog configuration/filters, health checks
are now filtered to /var/log/messages and not into specified haproxy
log files as was before.


Answering to my own question.

Indeed, in HAproxy 2.0, 'option log-health-checks' messages are emitted 
usig syslog facility 'daemon' and not the facility configured with 
global configuration keyword 'log'.
In 1.9, health check logs were emitted as defined by 'log' facility 
value.


If this is intended, I suggest adding this information to documentation 
for 'option log-health-checks'. 
http://cbonte.github.io/haproxy-dconv/2.0/configuration.html#option%20log-health-checks



Veiko




Health check logging differences between 1.9 and 2.0

2020-10-20 Thread Veiko Kukk

Hi

I've upgraded some servers from 1.9.15 to 2.0.18. Log config is very 
simple.


global
log /dev/log local0

defaults
log global
option httplog
option log-health-checks

Without any changes to rsyslog configuration/filters, health checks are 
now filtered to /var/log/messages and not into specified haproxy log 
files as was before.


Why? Did something change between 1.9 and 2.0 regarding to 'option 
log-health-checks'?


Best regards,
Veiko



Re: 2.0.14 PCRE2 JIT compilation failed

2020-04-24 Thread Veiko Kukk

On 2020-04-24 12:47, Veiko Kukk wrote:

HAproxy 2.0.14 on CentOS 7.7.1908 with PCRE2 JIT enabled (USE_PCRE2=1
USE_PCRE2_JIT=1).

When starting it with configuration that has following ACL regex line, 
it fails:


acl path_is_foo path_reg 
^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/


Error message:
error detected while parsing ACL 'path_is_foo' : regex
'^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/' jit compilation
failed.


Hi again,

It has happened to many of us that after asking for help, a good idea to 
test/debug comes.


It turned out to be selinx issue.

#= haproxy_t ==

# This avc can be allowed using the boolean 'cluster_use_execmem'
allow haproxy_t self:process execmem;


I wonder if somewhere in HAproxy documentation about pcre jit, it is 
mentioned that in case of selinux, selinux rules must be changed for the 
jit to work. If not, would be nice to add it.


--
Best regards,
Veiko



2.0.14 PCRE2 JIT compilation failed

2020-04-24 Thread Veiko Kukk

Hi

Since 1.9 support ends soon, I'm trying to start using 2.0 series.

HAproxy 2.0.14 on CentOS 7.7.1908 with PCRE2 JIT enabled (USE_PCRE2=1 
USE_PCRE2_JIT=1).


When starting it with configuration that has following ACL regex line, 
it fails:


acl path_is_foo path_reg 
^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/


Error message:
error detected while parsing ACL 'path_is_foo' : regex 
'^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/' jit compilation 
failed.


Appearantly this regex has been working with PCRE (not PCRE2) and 
without jit for quite long time using 1.9 releases of HAproxy (I have 
not personally created nor tested this regex). When compiling HAproxy 
with PCRE2 but without JIT support, haproxy does not complain about this 
regular expression, no errors at all.


I did not find much information of HAproxy path_reg regular expression 
syntax. Is it necessary to escape forward slashes? How to debug this 
issue, what is wrong with this expression?


$ haproxy -vv
HA-Proxy version 2.0.14 2020/04/02 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement -fwrapv -Wno-unused-label 
-Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration 
-Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers 
-Wtype-limits
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_THREAD=1 USE_REGPARM=1 
USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1


Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE 
-PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD 
-PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY 
+LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO 
+OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO 
+NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER 
+PRCTL +THREAD_DUMP -EVPORTS


Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.5
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND

Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE2 version : 10.23 2017-02-14
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' 
keyword)

  h2 : mode=HTXside=FE|BE mux=H2
  h2 : mode=HTTP   side=FEmux=H2
: mode=HTXside=FE|BE mux=H1
: mode=TCP|HTTP   side=FE|BE mux=PASS

Available services : none

Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace

$ ldd /sbin/haproxy
linux-vdso.so.1 =>  (0x7ffebcde1000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x7f7ac1989000)
libz.so.1 => /lib64/libz.so.1 (0x7f7ac1773000)
libdl.so.2 => /lib64/libdl.so.2 (0x7f7ac156f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7f7ac1353000)
librt.so.1 => /lib64/librt.so.1 (0x7f7ac114b000)
libssl.so.10 => /lib64/libssl.so.10 (0x7f7ac0ed9000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x7f7ac0a76000)
libm.so.6 => /lib64/libm.so.6 (0x7f7ac0774000)
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x7f7ac0543000)
libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x7f7ac02cc000)
libpcre2-posix.so.1 => /lib64/libpcre2-posix.so.1 (0x7f7ac00c9000)
libc.so.6 => /lib64/libc.so.6 (0x7f7abfcfb000)
libfreebl3.so => /lib64/libfreebl3.so (0x7f7abfaf8000)
/lib64/ld-linux-x86-64.so.2 (0x7f7ac1bc)
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x7f7abf8ab000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x7f7abf5c2000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x7f7abf3be000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x7f7abf18b000)
libcap.so.2 => /lib64/libcap.so.2 (0x7f7abef86000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x7f7abed5f000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x7f7abeb39000)
liblz4.so.1 => /lib64/liblz4.so.1 

Understanding resolvers usage

2020-03-20 Thread Veiko Kukk

Hi

I'd like to have better understanding how server-template and resolvers 
work together. HAproxy 1.9.14.


Relevant sections from config:

resolvers dns
  accepted_payload_size 1232
  parse-resolv-conf
  hold valid 90s
  resolve_retries 3
  timeout resolve 1s
  timeout retry 1s

server-template srv 4 _foo._tcp.server.name.tld ssl check resolvers dns 
resolve-prefer ipv4 resolve-opts prevent-dup-ip


After some time, when I check statistics from socket:

echo "show resolvers" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio

Resolvers section dns
 nameserver 127.0.0.1:
  sent:33508
  snd_error:   0
  valid:   33502
  update:  2
  cname:   0
  cname_error: 0
  any_err: 0
  nx:  0
  timeout: 0
  refused: 0
  other:   0
  invalid: 0
  too_big: 0
  truncated:   0
  outdated:6
 nameserver 8.8.8.8:
  sent:33508
  snd_error:   0
  valid:   0
  update:  0
  cname:   0
  cname_error: 0
  any_err: 0
  nx:  0
  timeout: 0
  refused: 0
  other:   0
  invalid: 0
  too_big: 0
  truncated:   0
  outdated:33508
 nameserver 8.8.4.4:
  sent:33508
  snd_error:   0
  valid:   0
  update:  0
  cname:   0
  cname_error: 0
  any_err: 0
  nx:  0
  timeout: 0
  refused: 0
  other:   0
  invalid: 0
  too_big: 0
  truncated:   0
  outdated:33508
 nameserver 64.6.64.6:
  sent:33508
  snd_error:   0
  valid:   6
  update:  0
  cname:   0
  cname_error: 0
  any_err: 0
  nx:  0
  timeout: 0
  refused: 0
  other:   0
  invalid: 0
  too_big: 0
  truncated:   0
  outdated:33502

What I wonder about here is why are all nameservers used instead of only 
the first one when there are no issues/errors with local caching server 
127.0.0.1:53. From the statistics, the 'sent:' value leaves me 
impression that all DNS servers get all requests. I that true?


/etc/resolv.conf itself:

nameserver 127.0.0.1

nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 64.6.64.6

options timeout:1 attempts:2

I'd like to achieve situation where other nameservers would be used only 
when local caching server fails. Don't want to manually configure only 
local one in resolvers section (no failover) and would very much prefer 
not to duplicate name server config in resolv.conf and HAproxy config.


--
Veiko




Re: 1.9 external health checks fail suddenly

2019-09-23 Thread Veiko Kukk

On 2019-08-28 11:13, Veiko Kukk wrote:

Applied it to 1.9.10, after ~ 12h it ran into spinlock using 400% cpu
(4 threads configured). Not sure if this is related to patch or is
some new bug in 1.9.10. I've now replaced running instance with 1.9.10
without external check patch to see if this happens again.


Now, after almost one month, with 1.9.10 (no patches) it happened again. 
All external checks failed again and there was large amount of zombie 
external check processes accumulated.
Unfortunately since I was not there doing reload, I can't tell timeframe 
or exact amount of those processes.


regards,
Veiko



Re: 1.9 external health checks fail suddenly

2019-08-28 Thread Veiko Kukk

On 2019-07-11 08:35, Willy Tarreau wrote:

against your version. Normally it should work for 1.9 to 2.1.


Applied it to 1.9.10, after ~ 12h it ran into spinlock using 400% cpu (4 
threads configured). Not sure if this is related to patch or is some new 
bug in 1.9.10. I've now replaced running instance with 1.9.10 without 
external check patch to see if this happens again.


best regards,
Veiko



Re: 1.9 external health checks fail suddenly

2019-07-10 Thread Veiko Kukk

On 2019-07-09 13:59, Lukas Tribus wrote:

How are you currently working around this issue? Did you disable
external checks? I'd assume failing checks have negative impact on
production systems also.


Since this has happened so far only 3 times during 2 months, we've just 
reloaded HAproxy when it happens.


Regards,
Veiko



Re: 1.9 external health checks fail suddenly

2019-07-10 Thread Veiko Kukk

On 2019-07-09 14:29, Willy Tarreau wrote:

I didn't have a patch but just did it. It was only compile-tested,
please verify that it works as expected on a non-sensitive machine
first!


Hi, Willy

Against what version should I run this patch?

Veiko



Re: 1.9 external health checks fail suddenly

2019-07-09 Thread Veiko Kukk

On 2019-07-08 16:06, Lukas Tribus wrote:

The bug you may be affected by is:
https://github.com/haproxy/haproxy/issues/141

Can you check what happens with:
nbthread 1


I'm afraid I can't because those are production systems that won't be 
able to service with single thread, they have relatively high ssl 
termination load.


Veiko



Re: 1.9 external health checks fail suddenly

2019-07-01 Thread Veiko Kukk

On 2019-07-01 10:11, Veiko Kukk wrote:

Hi

Sometimes (infrequently) all external checks hang and time out:
* Has happened with versions 1.9.4 and 1.9.8 on multiple servers with
nbproc 1 and nbthread set to (4-12) depending on server.
* Happens infrequently, last one happened after 10 days of uptime.
* External checks are written in python and write errors into their
own log file directly. When hanging happens, nothing is logged by
external check.
* Only external checks fail, common 'option httpcheck' does not fail
at the same time.


Might be useful to add that reload helps to get over, external health 
checks start working again.




1.9 external health checks fail suddenly

2019-07-01 Thread Veiko Kukk

Hi

Sometimes (infrequently) all external checks hang and time out:
* Has happened with versions 1.9.4 and 1.9.8 on multiple servers with 
nbproc 1 and nbthread set to (4-12) depending on server.

* Happens infrequently, last one happened after 10 days of uptime.
* External checks are written in python and write errors into their own 
log file directly. When hanging happens, nothing is logged by external 
check.
* Only external checks fail, common 'option httpcheck' does not fail at 
the same time.


HA-Proxy version 1.9.8 2019/05/13 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement -fwrapv -Wno-unused-label 
-Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration 
-Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers 
-Wtype-limits
  OPTIONS = USE_ZLIB=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 
USE_STATIC_PCRE=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 
200


Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.5
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND

Built with zlib version : 1.2.3
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as  cannot be specified using 'proto' 
keyword)

  h2 : mode=HTXside=FE|BE
  h2 : mode=HTTP   side=FE
: mode=HTXside=FE|BE
: mode=TCP|HTTP   side=FE|BE

Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace

Veiko



Re: [PATCH v2 1/2] MINOR: systemd: Use the variables from /etc/default/haproxy

2019-05-08 Thread Veiko Kukk

On 2019-05-06 11:00, Tim Duesterhus wrote:

From: Apollon Oikonomopoulos 

This will allow seamless upgrades from the sysvinit system while 
respecting
any changes the users may have made. It will also make local 
configuration

easier than overriding the systemd unit file.

Note by Tim:

This GPL-2 licensed patch was taken from the Debian project at [1].

It was slightly modified to cleanly apply, because HAProxy's default 
unit
file does not include rsyslog.service as an 'After' dependency. Also 
the

subject line was modified to include the proper subsystem and severity.


I think, instead of After=rsyslog.service, it should be 
After=syslog.service, then any logger daemon could be used that has 
Alias=syslog.service.


https://www.freedesktop.org/wiki/Software/systemd/syslog/

Regards,
Veiko



Re: Early connection close, incomplete transfers

2019-02-20 Thread Veiko Kukk

On 2019-02-19 06:47, Willy Tarreau wrote:


This is interesting. As you observed in the trace you sent me, the
lighttpd server closes just after sending the response headers. This
indeed matches the "SD" log that aproxy emits. If it doesn't happen
in TCP mode nor with Nginx, it means that something haproxy modifies
in the request causes this effect on the server.


Hi

I'm sending answer from colleague who investigated this more thoroughly, 
especially from lighttpd side:


we've been debugging this a bit further and it does not look like the 
issue with the seemingly random incomplete HTTP responses would be due 
to any particular request headers at the HTTP layer. It rather looks 
like something at the TCP level (so specific to HTTP mode):


A first observation we made is that the frequency of these incomplete 
transfers increases when we add a delay at the backend server after 
sending the response headers and before sending the body data. We added 
a 100 ms delay there and then got a lot of interrupted transfers that 
had only received the response headers (= no delay) but 0 bytes of the 
body (= which was sent just after delay). So the frequency with which 
this happens appears to be proportional to latencies/stalls in the 
backend server sending the response data (some read timeout logic at 
haproxy??).


We debugged further and noticed that in all cases where transfers were 
incomplete our lighttpd backend server was receiving an EPOLLRDHUP event 
on the socket where it communicates with haproxy. So it appears as if 
haproxy is *sometimes* (apparently depending on some read latency/stall 
- see above) shutting down its socket with the backend for writing 
*before* the full response and body data has been received.


And this is also basically ok because the socket remains writeable for 
lighttpd and so it could still send down the rest of the response data. 
However, it looks like lighttpd is not expecting this kind of behavior 
from the client and is not correctly handling such a half-closed TCP 
session. There is code in lighttpd to handle such a EPOLLRDHUP event and 
half-closed TCP connection, but lighttpd then also checks the state of 
the TCP session with getsockopts and keeps the connection open *only* 
when the state is TCP_CLOSE_WAIT. In all other cases upon receiving the 
EPOLLRDHUP it actively changes the state of the connection to "ERROR" 
and then closes the connection:


https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L908
https://github.com/lighttpd/lighttpd1.4/blob/master/src/fdevent.c#L995

We checked and every time we have a incomplete response lighttpd 
receives the EPOLLRDHUP event on the socket but the tcp state queried 
via getsockopts is always TCP_CLOSE (and not TCP_CLOSE_WAIT as lighttpd 
seems to expect). And because of this lighttpd then actively closes the 
half-closed connection also from its end (which likely is the cause of 
the TCP FIN sent by lighttpd as seen in the tcpdump).


When we remove this condition from lighttpd which marks the connection 
as errorness in case of EPOLLRDHUP and tcp state != TCP_CLOSE_WAIT, then 
the problem with the incomplete transfers disappears:


https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L922

We do not understand why this is or what the correct reaction to the 
EPOLLRDHUP event should be. In particular, we do not understand why 
lighttpd performs this check for TCP_CLOSE_WAIT or why we always get a 
state of TCP_CLOSE when we receive this event but the socket still 
continues to be writeable (so does the TCP_CLOSE just indicate that one 
direction of the connection is closed??). Still, because this 
half-closing of the connection to the backed server appears to happen 
just pretty randomly and depending on latency/stalls of the backend 
server sending down the response data, we assume that this is not the 
intended behavior by haproxy (and so possibly indicates some bug in 
haproxy too).


We assume that the reason why direct requests to the backend server or 
requests proxied via Nginx did never fail is because in these cases 
there never occurs the EPOLLRDHUP event and there never are half-closed 
connections. However, we have not tested this (yet), so we did not 
re-test with Nginx to verify that then indeed lighttpd never sees a 
EPOLLRDHUP.


Any ideas or suggestions based on these findings what should be the 
proper solution to the problem?


Thank you.



Re: Early connection close, incomplete transfers

2019-02-14 Thread Veiko Kukk

On 2019-02-14 18:29, Aleksandar Lazic wrote:
Replaced HAproxy with Nginx for testing and with Nginx, not a single 
connection

was interrupted, did millions of requests.


In 1.9.4 are a lot of fixed added.
please can you try your tests with 1.9.4, thanks.


Already did before writing my previous letter. No differencies.

Veiko



Re: Early connection close, incomplete transfers

2019-02-14 Thread Veiko Kukk



On 2019-02-01 13:30, Veiko Kukk wrote:

On 2019-02-01 12:34, Aleksandar Lazic wrote:


Do you have any errors in lighthttpds log?


Yes, it has error messages about not being enable to write to socket.

Unrecoverable error writing to socket! errno 32, retries 12, ppoll
return 1, send return -1
ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1

I've tested with several hundred thoused requests, but it never
happens when using "mode tcp".


Replaced HAproxy with Nginx for testing and with Nginx, not a single 
connection was interrupted, did millions of requests.


Veiko



Re: Early connection close, incomplete transfers

2019-02-04 Thread Veiko Kukk

On 2019-02-01 17:02, Willy Tarreau wrote:

Hi Veiko,
Are you certain that 1.9 and 1.7 have the same issue ? I mean, you
could be observing two different cases looking similar. If you're
sure it's the same issue, it could rule out a number of parts that
differ quite a lot between the two (idle conns etc).


I'm sure it happens with all versions we have tried: 1.6, 1.7, 1.9 (did 
not try 1.8, because we have never used it in production and decided to 
switch directly to 1.9), but how could we make sure it's caused by 
something different between versions if we observe very similar results. 
Since it's happening at random, it's hard to judge if there is slight 
change in one or another direction. Logs look same for all versions.


Only 'mode tcp' helps to get rid of those errors.


Do you know if the responses headers are properly delivered to the
client when this happens ? And do they match what you expected ? Maybe
the contents are sometimes invalid and the response rejected by 
haproxy,

in which case a 502 would be returned to the client. When this happens,
emitting "show errors" on the CLI will report it.


I don't know, don't know about headers, don't have good tool to capture 
headers for failed connections only. Any suggestions?


echo "show errors" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio
Total events captured on [04/Feb/2019:13:46:33.167] : 0

Could you also check if this happens only/more with keep-alive, close 
or

server-close ?


I have seen no difference, unfortunately.

If you can run more tests in your test environment, I'd be interested 
in

seeing how latest 2.0-dev works with these variants :


Tested with 
http://www.haproxy.org/download/2.0/src/snapshot/haproxy-ss-20190204.tar.gz



  - http-reuse never


No difference, lot's of incomplete transfers.


  - http-reuse always


No difference, lot's of incomplete transfers.


  - option httpclose


No difference, lot's of incomplete transfers.


  - option http-server-close


No difference, lot's of incomplete transfers.


  - option keep-alive


I assume you meant 'option http-keep-alive' because there is no 'option 
keep-alive'.

No difference, lot's of incomplete transfers.


I'm asking for 2.0-dev because it's where all known bugs are fixed.
If none of these settings has any effect, we'll have to look at network
traces I'm afraid.


Would you like to have network traffic dump?

Regards,
Veiko



Re: Early connection close, incomplete transfers

2019-02-01 Thread Veiko Kukk

On 2019-02-01 12:34, Aleksandar Lazic wrote:


Do you have any errors in lighthttpds log?


Yes, it has error messages about not being enable to write to socket.

Unrecoverable error writing to socket! errno 32, retries 12, ppoll 
return 1, send return -1

ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1

I've tested with several hundred thoused requests, but it never happens 
when using "mode tcp".


Regards,
Veiko



Re: Early connection close, incomplete transfers

2019-02-01 Thread Veiko Kukk



On 2019-01-31 12:57, Aleksandar Lazic wrote:

Willy have found some issues which are added in the code of 2.0 tree.
Do you have a chance to test this branch or do you want to wait for
the next 1.9 release?


I tested stable 1.9.3 and 1.9 preview version Willy gave link here 
https://www.mail-archive.com/haproxy@formilux.org/msg32678.html

There is no difference in my tests.


I'm not sure if it affects you as we haven't seen the config yet.
Maybe you can share your config also so that we can see if your setup
could be effected.


Commented timeouts are original timeouts, I had increased those to make 
sure, I'm not hitting any timeouts when creating higher load with tests. 
Maxconn values  serve the same purpose.


global
  log /dev/log local0
  daemon
  nbproc 1
  nbthread 16
  maxconn 
  user haproxy
  spread-checks 5
  tune.ssl.default-dh-param 2048
  ssl-default-bind-options no-sslv3 no-tls-tickets
  ssl-default-bind-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS

  ssl-default-server-options no-sslv3 no-tls-tickets
  ssl-default-server-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS

  tune.ssl.cachesize 10
  tune.ssl.lifetime 1800
  stats socket /var/run/haproxy.sock.stats1 mode 640 group vault process 
1 level admin


defaults
  log global
  mode http
  option httplog
  option contstats
  option log-health-checks
  retries 5
  #timeout http-request 5s
  timeout http-request 99s
  #timeout http-keep-alive 20s
  timeout http-keep-alive 99s
  #timeout connect 10s
  timeout connect 99s
  #timeout client 30s
  timeout client 99s
  timeout server 120s
  #timeout client-fin 10s
  timeout client-fin 99s
  #timeout server-fin 10s
  timeout server-fin 99s

listen main_frontend
  bind *:443 ssl crt /etc/vault/cert.pem crt /etc/letsencrypt/certs/ 
maxconn 

  bind *:80 maxconn 
  option forwardfor
  acl local_lighty_down nbsrv(lighty_load_balancer) lt 1
  monitor-uri /load_balance_health
  monitor fail if local_lighty_down
  default_backend lighty_load_balancer

backend lighty_load_balancer
  stats enable
  stats realm statistics
  http-response set-header Access-Control-Allow-Origin *
  option httpchk HEAD /dl/index.html
  server lighty0 127.0.0.1:9000 check maxconn  fall 2 inter 15s rise 
5 id 1


Test results

httpress test output summary:

1 requests launched
thread 3: 1000 connect, 1000 requests, 983 success, 17 fail, 6212668130 
bytes, 449231 overhead
thread 9: 996 connect, 996 requests, 979 success, 17 fail, 6187387690 
bytes, 447403 overhead
thread 4: 998 connect, 998 requests, 980 success, 18 fail, 6193707800 
bytes, 447860 overhead
thread 1: 1007 connect, 1007 requests, 988 success, 19 fail, 6244268680 
bytes, 451516 overhead
thread 8: 998 connect, 998 requests, 977 success, 21 fail, 6174747470 
bytes, 446489 overhead
thread 7: 1001 connect, 1001 requests, 970 success, 31 fail, 6130506700 
bytes, 443290 overhead
thread 10: 997 connect, 997 requests, 983 success, 14 fail, 6212668130 
bytes, 449231 overhead
thread 6: 1004 connect, 1004 requests, 986 success, 18 fail, 6231628460 
bytes, 450602 overhead
thread 5: 999 connect, 999 requests, 982 success, 17 fail, 6206348020 
bytes, 448774 overhead
thread 2: 1000 connect, 1000 requests, 981 success, 19 fail, 6200027910 
bytes, 448317 overhead


TOTALS:  1 connect, 1 requests, 9809 success, 191 fail, 100 
(100) real concurrency
TRAFFIC: 6320110 avg bytes, 457 avg overhead, 61993958990 bytes, 4482713 
overhead

TIMING:  81.014 seconds, 121 rps, 747335 kbps, 825.9 ms avg req time


HAproxy log sections of incomplete transfers (6320535 bytes should be 
transferred with this test data set):
 127.0.0.1:33054 [01/Feb/2019:11:22:48.178] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/298 200 425 - - SD-- 
100/100/99/99/0 0/0 "
 127.0.0.1:32820 [01/Feb/2019:11:22:48.068] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/409 200 4990 - - SD-- 99/99/98/98/0 
0/0 "
 127.0.0.1:34330 [01/Feb/2019:11:22:49.199] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/90 200 425 - - SD-- 100/100/99/99/0 
0/0 "
 127.0.0.1:34344 [01/Feb/2019:11:22:49.201] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/88 200 425 - - SD-- 99/99/98/98/0 
0/0 "
 127.0.0.1:34658 [01/Feb/2019:11:22:49.447] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/254 200 425 - - SD-- 
100/100/98/98/0 0/0 "
 127.0.0.1:34386 [01/Feb/2019:11:22:49.290] main_frontend 
lighty_load_balancer/lighty0 0/0/0/0/412 200 425 - - SD-- 
100/100/98/98/0 0/0 "
 127.0.0.1:34388 [01/Feb/2019:11:22:49.290] main_frontend 

Early connection close, incomplete transfers

2019-01-31 Thread Veiko Kukk

HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11.

Connections are getting closed during data transfer phase at random 
sizes on backend. Sometimes just as little as 420 bytes get transferred, 
but usually more is transferred before sudden end of connection. HAproxy 
logs have connection closing status SD-- when this happens.


Basic components of system look like this:
Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin

Our HTTP server part is compiling data from chunks it gets from local 
cache. When it receives request from client via HAproxy, it sends 
response header, then fetches chunks, compiles those and sends data 
client.


SD-- happens more frequently when connection between benchmarking tool 
and HAproxy is fast, e.g. when doing tests where client side is not 
loaded much. Happens much more for http than for https.


For example:

httpress -t1 -c10 -n1000 URL (rarely or not at all)
250 requests launched
500 requests launched
750 requests launched
1000 requests launched

TOTALS:  1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real 
concurrency
TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 
452000 overhead

TIMING:  241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time

httpress -t10 -c10 -n1000 URL (happens frequently)

2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection 
closed
2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection 
closed

1000 requests launched
2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection 
closed
thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 
bytes, 32544 overhead
thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 
bytes, 32544 overhead
thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 
bytes, 32544 overhead
thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 
bytes, 30284 overhead
thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 
bytes, 25312 overhead
thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 
bytes, 30736 overhead
thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 
bytes, 30736 overhead
thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 
bytes, 13108 overhead
thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 
bytes, 32996 overhead
thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 
bytes, 17628 overhead


TOTALS:  1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) 
real concurrency
TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 
278432 overhead

TIMING:  170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time

Because of thread count differences, -t1 (one thread) test is much more 
loaded on client side than it is with -t10 (ten threads).


Random samples from HAproxy log (proper size of the object in HAproxy 
logs is 667960042 bytes for that test file).

0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0
0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0
0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0

Our HTTP server logs contain hard unrecoverable errors about unable to 
write to socket when HAproxy closes connection:

Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec
Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 
msec
Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 
msec
Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 
msec
Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 
msec
Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 
msec
Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 
msec
Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 
msec
Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 979 
msec
Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1087 
msec
Return Code: 32. Transferred 139962654 out of 667959622 Bytes in 918 
msec
Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 977 
msec
Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1079 
msec
Return Code: 32. Transferred 177068702 out of 667959622 Bytes in 1060 
msec
Return Code: 32. Transferred 119149343 out of 667959622 Bytes in 881 
msec
Return Code: 32. 

Re: HA Proxy Load Balancer

2018-12-21 Thread Veiko Kukk

On 2018-12-20 20:41, Lance Melancon wrote:

Thanks for the info. Unfortunately I am not a programmer by a long
shot and syntax is a big problem for me. I tried a few things but no
luck and I can't find any examples of a redirect.
So do I need both the backend and acl statements?
I'm simply trying to use mysite.net to direct to mysite.net/website.
Any time I use a / the config fails.


Maybe this will help you 
http://www.catb.org/esr/faqs/smart-questions.html


Veiko



Re: 1.7.11 with gzip compression serves incomplete files

2018-12-06 Thread Veiko Kukk

Hi, Willy

On 2018-12-06 04:43, Willy Tarreau wrote:

In the mean time it would be useful to see if adding
"option http-pretend-keepalive" helps. This way we'll know if
it's the server closing first or haproxy closing first which
triggers this. And if it turns out that it fixes the issue for
you, it could be a good temporary workaround.


Indeed, adding "option http-pretend-keepalive" helps.

Veiko



Re: 1.7.11 with gzip compression serves incomplete files

2018-12-05 Thread Veiko Kukk

On 2018-11-30 09:40, Christopher Faulet wrote:


Now, I'm still puzzled with this issue. Because I can't reproduce it
for now. And it is even more strange because when the compression is
enabled and used on a response, it cannot be switched in TUNNEL mode.
So I don't really understand how the patch you mentioned could fix a
compression bug, or the commit 8066ccd39 (as stated on discourse)
could be the origin of the bug.


I've found that 'option http-server-close' in frontend is causing this. 
Commenting it out and 1.7.11 is working fine with gzip compression.


I'm not gathering more debug data at the moment, maybe this already 
helps to reproduce the issue. I will provide more if necessary.



Finally, I have a last question. You said the result is truncated. It
means the response is truncated because of a close and not all chunks
are received ? Or the response is correct from the HTTP point of view,
but the file is truncated once uncompressed ?


Uncompressed file is truncated, part of it at the end is missing.
Not sure what do you mean by correctness from HTTP point of view, but 
the headers look fine.



GET /assets/js/piwik.js?v=1534 HTTP/1.1
Host: foobar.tld
User-Agent: curl/7.61.1
Accept: */*
Accept-Encoding: deflate, gzip, br


{ [5 bytes data]
< HTTP/1.1 200 OK
< Date: Wed, 05 Dec 2018 15:45:26 GMT
< Content-Encoding: gzip
< Content-Language: en
< Content-Location: http://foobar.tld/assets/js/piwik.js?v=1534
< Content-Type: application/x-javascript; charset=UTF-8
< Expires: Sat, 02 Dec 2028 15:45:26 GMT
< Last-Modified: Fri, 23 Nov 2018 15:51:00 GMT
< Cache-Control: max-age=31536
< Date: Wed, 05 Dec 2018 15:45:26 GMT
< Accept-Ranges: bytes
< Server: Restlet-Framework/2.3.9
< Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
< Connection: close

65236 Dec  5 16:29 piwik.js (without option http-server-close)
41440 Dec  5 17:45 piwik.js_ (with option http-server-close)

Regards,
Veiko



1.7.11 with gzip compression serves incomplete files

2018-11-26 Thread Veiko Kukk

Hi!

There is not much to add, just that it's been broken before in 1.7.9 and 
is again broken in 1.7.11.  Works with 1.7.10.
When applying patch provided here 
https://www.mail-archive.com/haproxy@formilux.org/msg27155.html 1.7.11 
also works.
Testing is really simple, just configure haproxy gzip compression and 
download with curl --compression or with web browser. Sample .js file I 
have downloaded has real size 42202 bytes but when downloading with gzip 
compression, it's size is 37648 bytes - part of the end is missing.
Very similar issue is discussed here too 
https://discourse.haproxy.org/t/1-7-11-compression-issue-parsing-errors-on-response/2542


Best regards,
Veiko



Understanding certain balance configuration

2018-07-30 Thread Veiko Kukk

Hi,

I'm trying to understand how balance url_param hash-type consistent 
should work. Haproxy 1.7.11.


Lets say, we have a config of two haproxy instances that balance content 
between local and remote (sibling).


server0 (10.0.0.1) would have config section like this:

backend load_balancer
  balance url_param file_id
  hash-type consistent
  server local_backend /path/to/socket id 1
  server remote_backend 10.0.0.2:80 id 2

backend local_backend
  balance url_param file_id
  hash-type consistent
  server server0 127.0.0.1:100
  server server1 127.0.0.1:200

server1 (10.0.0.2) would have config section like this:

backend load_balancer
  balance url_param file_id
  hash-type consistent
  server local_backend /path/to/socket id 2
  server remote_backend 10.0.0.1:80 id 1

backend local_backend
  balance url_param file_id
  hash-type consistent
  server server0 127.0.0.1:100
  server server1 127.0.0.1:200

Assuming that all requests indeed have URL parameter "file_id", should 
requests on both servers only reach single "local_backend" server since 
they are already balanced and are not anymore divided in "local_backend" 
because of identical configuration on both "load_balancer" and 
"local_backend"?


thanks in advance,
Veiko



Re: force-persist and use_server combined

2018-07-30 Thread Veiko Kukk

On 07/25/2018 03:05 PM, Veiko Kukk wrote:
The idea here is that HAproxy statistics page, some other backend 
statistics and also some remote health checks running against path under 
/dl/ would always reach only local_http_frontend, never go anywhere else 
even when local really is down, not just marked as down.


This config does not work, it forwards /haproxy?stats request to 
remote_http_frontend when local_http_frontend is really down.


Is it expected? Any ways to overcome this limitation?


I wonder if my question was too stupid or was just left unnoticed by 
someone who knows how force-persist is supposed to be working.


Meanwhile I've created workaround by adding additional config sections 
and having use_backend ACL instead of use_server ACL to achieve what was 
needed.


regards,
Veiko




force-persist and use_server combined

2018-07-25 Thread Veiko Kukk

Hi,

I'd like to understand if I've made a mistake in configuration or there 
might be a bug in HAproxy 1.7.11.


defaults section has "option redispatch".

backend load_balancer
  mode http
  option httplog
  option httpchk HEAD /load_balance_health HTTP/1.1\r\nHost:\ foo.bar
  balance url_param file_id
  hash-type consistent

  acl status0 path_beg -i /dl/
  acl status1 path_beg -i /haproxy
  use-server local_http_frontend if status0 or status1
  force-persist if status0 or status1

  server local_http_frontend /var/run/haproxy.sock.http-frontend check 
send-proxy

  server remote_http_frontend 192.168.1.52:8080 check send-proxy


The idea here is that HAproxy statistics page, some other backend 
statistics and also some remote health checks running against path under 
/dl/ would always reach only local_http_frontend, never go anywhere else 
even when local really is down, not just marked as down.


This config does not work, it forwards /haproxy?stats request to 
remote_http_frontend when local_http_frontend is really down.


Is it expected? Any ways to overcome this limitation?

Thanks in advance,
Veiko




Re: Truly seamless reloads

2018-06-01 Thread Veiko Kukk

On 31/05/18 23:15, William Lallemand wrote:

Sorry but unfortunately we are not backporting features in stable branches,
those are only meant for maintenance.

People who want to use the seamless reload should migrate to HAProxy 1.8, the
stable team won't support this feature in previous branches.



I've been keeping eye on this list about 1.8 related bugs and it does 
not seem to me that 1.8 stable enough yet for production use. Too many 
reports about high CPU usage and/or crashes.
We are still using 1.6 which finally seems to have stabilized enough for 
production. When we started using 1.6 some years ago, we had many issues 
with it which caused service interruptions. Would not want to repeat 
that again.


Even with 1.7, processes would hang forever after reload (days, 
sometimes weeks or until reboot). Really hard to debug, happens only 
under production load.


I will look at patches provided by Dave. We are building HAproxy rpm-s 
for ourselves anyway, applying some patches in spec file does not seem 
to be that much additional work if indeed those would provide truly 
seamless reloads.


Best regards,
Veiko



Re: Truly seamless reloads

2018-04-30 Thread Veiko Kukk

On 26/04/18 17:11, Veiko Kukk wrote:

Hi,

According to 
https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ 
:


"The patchset has already been merged into the HAProxy 1.8 development 
branch and will soon be backported to HAProxy Enterprise Edition 1.7r1 
and possibly 1.6r2."


Has it been backported to 1.7 and/or 1.6?

If yes, then should seamless reload also work with multiprocess 
configurations? (nbproc > 1).


Can i assume the answer is no for both questions?


Veiko




Truly seamless reloads

2018-04-26 Thread Veiko Kukk

Hi,

According to 
https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ 
:


"The patchset has already been merged into the HAProxy 1.8 development 
branch and will soon be backported to HAProxy Enterprise Edition 1.7r1 
and possibly 1.6r2."


Has it been backported to 1.7 and/or 1.6?

If yes, then should seamless reload also work with multiprocess 
configurations? (nbproc > 1).


Thanks in advance,
Veiko



Re: 1.7.10 and 1.6.14 always compress response

2018-04-10 Thread Veiko Kukk

On 04/10/2018 03:51 PM, William Lallemand wrote:

On Tue, Apr 10, 2018 at 03:43:12PM +0300, Veiko Kukk wrote:

Hi,



Hi,



This happens even when either compression algo nor compression type are
specified in haproxy configuration file.



If you didn't specify any compression keyword in the haproxy configuration
file, that's probably your backend server which is doing the compression.


Actually, you are right.
What is suprising, is that in case of requesting non-compressed from 
haproxy, it still passes through compressed data.


Maybe that's how standard specifies, I don't know.

Thanks,
Veiko





1.7.10 and 1.6.14 always compress response

2018-04-10 Thread Veiko Kukk

Hi,


Lets run simple query against host (real hostnames replaced).

curl https://testhost01.tld -o /dev/null -vvv

Request headers:

> GET / HTTP/1.1
> Host: testhost01.tld
> User-Agent: curl/7.58.0
> Accept: */*

Response headers:

< HTTP/1.1 200 OK
< Date: Tue, 10 Apr 2018 12:23:44 GMT
< Content-Encoding: gzip
< Content-Type: text/html;charset=utf-8
< Cache-Control: no-cache
< Date: Tue, 10 Apr 2018 12:23:44 GMT
< Accept-Ranges: bytes
< Server: Restlet-Framework/2.3.4
< Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
< Connection: close
< Access-Control-Allow-Origin: *
< Strict-Transport-Security: max-age=15768000

This happens even when either compression algo nor compression type are 
specified in haproxy configuration file.


But lets say during request that we don't want any compression:

curl https://testhost01.tld -H "Accept-Encoding: identity" -o /dev/null -vvv

Request headers:

> GET / HTTP/1.1
> Host: testhost01.tld
> User-Agent: curl/7.58.0
> Accept: */*
> Accept-Encoding: identity

Response headers:

< HTTP/1.1 200 OK
< Date: Tue, 10 Apr 2018 12:40:25 GMT
< Content-Encoding: gzip
< Content-Type: text/html;charset=utf-8
< Cache-Control: no-cache
< Date: Tue, 10 Apr 2018 12:40:25 GMT
< Accept-Ranges: bytes
< Server: Restlet-Framework/2.3.4
< Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
< Connection: close
< Access-Control-Allow-Origin: *
< Strict-Transport-Security: max-age=15768000

Still, response is gzipped.

HA-Proxy version 1.6.14-66af4a1 2018/01/02
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement -fwrapv

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Running on zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

HA-Proxy version 1.6.14-66af4a1 2018/01/02
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement -fwrapv

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Running on zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.




Re: Logging errors during reload of haproxy

2017-11-03 Thread Veiko Kukk

Hi Lukas,

On 11/03/2017 02:53 PM, Lukas Tribus wrote:

# service haproxy reload
[ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource temporarily
unavailable (errno=11)


Well the destination logging socket is unavailable. I don't think
there is a lot to do here
on the haproxy side, this mostly depends on the destination socket and
the kernel.

I would suggest you use a UDP destination instead. That should be
better suited to
handle logging at this rate.


This is a test system with not much load other than my little 'ab -c 10 
...' is creating. We have unix logging everywhere locally, works even 
under heavy load.


First i suspected change in config where i added 'log /dev/log local0', 
but after commenting that, those messages are still appear. Once per 
process after reload, every time when doing quick reloads e.g. for i in 
{1..10}; do service haproxy reload; done But sometimes even when not 
restarting quickly. I have cronjob that runs after every 3 minutes and 
reloads haproxy, then this error appears sometimes, not each time.



another bug about processes never closing after reload


Unless you are hitting a bug already fixed (make sure you use a
current stable release), it's
likely that long running sessions keep haproxy running.

Use the hard-stop-after directive to limit the time haproxy spends in
this state:
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#3.1-hard-stop-after


I would not comment hanging process bug more under this thread, because 
it's off topic. Will create new thread for that. Planned anyway, but 
wanted first create reproduction instructions. So far, it's quite random...


Regards,
Veiko




Re: Logging errors during reload of haproxy

2017-11-03 Thread Veiko Kukk

On 11/03/2017 01:21 PM, Veiko Kukk wrote:

Hi,

I noticed, while trying to reproduce conditions for another bug about 
processes never closing after restart, that sometimes reload causes 
logging errors displayed.


Should read here "never closing after *reload*".

Veiko



Logging errors during reload of haproxy

2017-11-03 Thread Veiko Kukk

Hi,

I noticed, while trying to reproduce conditions for another bug about 
processes never closing after restart, that sometimes reload causes 
logging errors displayed.


Following config section might be relevant:

global
  log /dev/log local0
  nbproc 3

defaults
  log /dev/log local0

frontend foo
  log /dev/log local1

...

# service haproxy reload
[ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource 
temporarily unavailable (errno=11)
[ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource 
temporarily unavailable (errno=11)
[ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource 
temporarily unavailable (errno=11)


# haproxy -vv
HA-Proxy version 1.7.9 2017/08/18
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement -fwrapv

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Running on zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[TRACE] trace
[COMP] compression

Veiko




Re: Possible regression in 1.6.12

2017-06-16 Thread Veiko Kukk

Hi, Willy

On 16/06/17 12:15, Willy Tarreau wrote:


So I have more info on this now. Veiko, first, I'm assuming that your config
was using "resolvers dns_resolvers" on the "server" line, otherwise resolvers
are not used.


My real world configs use resolvers, but timeouts happen even when 
resolver was not used anywhere. It is why I did not include resolvers in 
the example config backend server provided with initial report e-mail. 
When keeping only single server under resolvers section, I did not 
notice any timeouts. And it did not matter whether that single server 
was local or Google.



What I've seen when running your config here is that google responds both in
IPv4 and IPv6. And depending on your local network settings, if you can't
reach them over IPv6 after the address was updated, your connection might
get stuck waiting for the connect timeout to strike (10s in your conf,
multiplied by the number of retries). The way to address this is to add
"resolve-prefer ipv4" at the end of your server line, it will always pick
IPv4 addresses only.


We have 'resolve-prefer ipv4' enabled in real world configuration when 
resolver is used actually on 'server' line. We have disabled IPv6 for 
all our servers. Anyway - since timeouts happen even without using 
resolver anywhere, this must not be the cause of timeouts.



BTW, (probably that it was just for illustration purpose), but please don't
use well-known services like google, yahoo or whatever for health checks. If
everyone does this, it will add a huge useless load to their servers.


It was just that anybody could use simple trimmed down configuration for 
quick testing. Real configuration has no need for having google.com as 
backend and is much more complex.


This exact configuration can be easily used to test 1.6.12 - a simple 
reload would cause two first google.com checks to fail with timouts. 
Also any requests against ssl-frontend will fail few first checks after 
reload.


Regards,
Veiko






Re: Possible regression in 1.6.12

2017-06-15 Thread Veiko Kukk

On 14/06/17 17:37, Willy Tarreau wrote:


Could you try to revert the attached patch which was backported to 1.6
to fix an issue where nbproc and resolvers were incompatible ? To do
that, please use "patch -Rp1 < foo.patch".


I have applied the patch. Now HAproxy working as in 1.6.11 version, no 
requests time out.



Also, have you noticed if your haproxy continues to work or if it loops
at 100% CPU for example ?


No, there is no excessive CPU load.

Best regards,
Veiko





Possible regression in 1.6.12

2017-06-14 Thread Veiko Kukk

Possible regression in 1.6.12

I might have discovered a haproxy bug. It occurs when all of the 
following configuration conditions are satisfied:

* haproxy version 1.6.12
* multiple processes
* resolvers section with more than one server configured (not even used 
anywhere)

* haproxy is either reloaded or restarted
* request is made against freshly reloaded/restarted haproxy or haproxy 
backend server health check is made. Both cases requests do not get 
response.


When accessing haproxy, requests time out. Backends will fail checks and 
are marked as down with timeout error. Happens with browsers, curl, 
wget. When downgrading to 1.6.11, timeouts don't happen.


How I tested:
1) reload haproxy with the minimal config below
2) then run: for i in {1..100}; do date --utc; echo $i; curl 
https://tsthost.tld/haproxy?stats -o /dev/null -s -m 50; done

Wed 14 Jun 11:45:44 UTC 2017
1
Wed 14 Jun 11:46:34 UTC 2017
2
Wed 14 Jun 11:47:24 UTC 2017
3
Wed 14 Jun 11:48:14 UTC 2017
4
Wed 14 Jun 11:48:14 UTC 2017
5
Wed 14 Jun 11:49:04 UTC 2017
6
Wed 14 Jun 11:49:05 UTC 2017
7
Wed 14 Jun 11:49:55 UTC 2017
8
Wed 14 Jun 11:49:55 UTC 2017
9
Wed 14 Jun 11:50:45 UTC 2017
10
Wed 14 Jun 11:50:46 UTC 2017
11
Wed 14 Jun 11:50:46 UTC 2017
12
Wed 14 Jun 11:50:46 UTC 2017

When removing either multiprocess configuration or resolvers section, no 
requests time out.


Following is trimmed down minimal config:
global
  daemon
  nbproc 3
  maxconn 500
  user haproxy
  tune.ssl.default-dh-param 2048
  ssl-default-bind-options no-sslv3 no-tls-tickets
  ssl-default-bind-ciphers 
AES128+EECDH:AES128+EDH:!ADH:!AECDH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK

  ssl-default-server-options no-sslv3 no-tls-tickets
  ssl-default-server-ciphers 
AES128+EECDH:AES128+EDH:!ADH:!AECDH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK

  stats socket /var/run/haproxy1.sock mode 600 process 1
  stats socket /var/run/haproxy2.sock mode 600 process 2
  stats socket /var/run/haproxy3.sock mode 600 process 3

defaults
  bind-process 3
  log /dev/log local0
  option log-health-checks
  option contstats
  timeout connect 10s
  timeout client 60s
  timeout server 60s

resolvers dns_resolvers
  # local caching named
  nameserver dns0 127.0.0.1:53
  # remote servers
  nameserver dns1 8.8.8.8:53
  nameserver dns2 8.8.4.4:53

listen ssl-frontend
  bind-process 1-2
  bind *:443 ssl crt /path/to/certificate.pem
  server http-frontend 127.0.0.1:666 send-proxy check

frontend http-frontend
  mode http
  stats enable
  option forwardfor
  option httplog
  bind *:80
  bind 127.0.0.1:666 accept-proxy

backend ssl_backend
  mode http
  option httplog
  server ssl_server google.com:443 check ssl verify none fall 2 inter 
5s fastinter 3s rise 3



HA-Proxy version 1.6.12 2017/04/04
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.



Re: Multiple url parameter based session limiting

2016-10-05 Thread Veiko Kukk

On 04/10/16 18:21, Veiko Kukk wrote:

Lets say, we have URL http://domain.tld?foo=abc=def.
I'd like to have current session limiting with sticky tables when both
foo and bar values match, but I'm not sure how to achieve this (in most
optimal way).


I found similar post from 2013. Not exactly what I need, but similar by 
the means that it also requires matching several query parameters.


https://www.mail-archive.com/haproxy@formilux.org/msg11680.html

Are the per-request variables now awailable and how to use them?

Regards,
Veiko



Multiple url parameter based session limiting

2016-10-04 Thread Veiko Kukk

Hi,

Lets say, we have URL http://domain.tld?foo=abc=def.
I'd like to have current session limiting with sticky tables when both 
foo and bar values match, but I'm not sure how to achieve this (in most 
optimal way).

Sticky tables are somewhat hard to understand for me.

stick-table type string len 48 size 1m expire 90m store conn_cur
tcp-request inspect-delay 2s
tcp-request content track-sc0 urlp(foo) if HTTP

This only adds url parameter foo, but I'd like to add something like 
urlp(foo and bar) which is not possible according to documentation.


Any suggestions how to accomplish this?

Regards,
Veiko



Re: Haproxy 1.6.9 failed to compile regex

2016-09-07 Thread Veiko Kukk


On 07/09/16 14:37, Veiko Kukk wrote:

I tried to upgrade from 1.6.8 to 1.6.9, but found strange errors printed
by haproxy 1.6.9. Any ideas, why?


Another strange issue is that 1.6.9 shows:
Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010

System does have openssl 1.0.1e-48.el6_8.1 installed and nothing else. 
So how is it possible that it's using different version than system has?


On the other hand - 1.6.8 reports proper openssl version:
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013

Veiko





Haproxy 1.6.9 failed to compile regex

2016-09-07 Thread Veiko Kukk

Hi,

I tried to upgrade from 1.6.8 to 1.6.9, but found strange errors printed 
by haproxy 1.6.9. Any ideas, why?


[ALERT] 250/112901 (12026) : parsing [/etc/haproxy/haproxy.cfg:57] : 
'reqirep' : regular expression '^([^ :]*) /(.*)' : failed to compile 
regex '^([^ :]*) /(.*)' (error=unknown or incorrect option bit(s) set)


[ALERT] 250/112901 (12026) : parsing [/etc/haproxy/haproxy.cfg:205] : 
'reqidel' : regular expression '^If-Match:.*' : failed to compile regex 
'^If-Match:.*' (error=unknown or incorrect option bit(s) set)


[ALERT] 250/112901 (12026) : parsing [/etc/haproxy/haproxy.cfg:279] : 
'rspidel' : regular expression '^Content-Location' : failed to compile 
regex '^Content-Location' (error=unknown or incorrect option bit(s) set)



Downgrading to 1.6.8 solves this error.

# haproxy -vv
HA-Proxy version 1.6.9 2016/08/30
Copyright 2000-2016 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

# haproxy -vv
HA-Proxy version 1.6.8 2016/08/14
Copyright 2000-2016 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Veiko



Re: 100% cpu , epoll_wait()

2016-05-23 Thread Veiko Kukk

On 18/05/16 15:42, Willy Tarreau wrote:

Hi Sebastian,

On Thu, May 12, 2016 at 09:58:22AM +0200, Sebastian Heid wrote:

Hi Lukas,

starting from around 200mbit/s in, haproxy processes (nbproc 6) are
hitting 100% cpu regularly (noticed up to 3 processes at the same time with
100%), but recover again on its own after some time.

stracing such a process yesterday showed the following:
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0

Unfortunately I can't do any more debugging in this setup. HAproxy 1.5.14 is
never near to 10% cpu usage with way higher bandwidth.


So far I've got good reports from people having experienced similar issues
with recent versions, thus I'm thinking about something, are you certain
that you did a make clean after upgrading and before rebuilding ? Sometimes
we tend to forget it, especially after a simple "git pull". It is very
possible that some old .o files were not properly rebuilt and still contain
these bugs. If in doubt, you can simply keep a copy of your latest haproxy
binary, make clean, build again and run cmp between them. It should not
report any difference otherwise it means there was an issue (which would be
a great news).


I can confirm that on CentOS 6 with HAproxy 1.6.5 this 100% CPU load 
still happens. Exactly the same:

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, ^CProcess 6200 detached
 

# haproxy -vv
HA-Proxy version 1.6.5 2016/05/10
Copyright 2000-2016 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Veiko




Re: 100% cpu , epoll_wait()

2016-04-20 Thread Veiko Kukk

On 20/04/16 11:43, Willy Tarreau wrote:

On Tue, Apr 19, 2016 at 09:53:36PM +0300, Veiko Kukk wrote:

On 19/04/16 18:52, Willy Tarreau wrote:

On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote:
OK in fact it's different. Above we have a busy polling loop, which may
very be caused by the buffer space miscalculation bug and which results
in a process not completing its job until a timeout strikes. The link to
the other report shows a normal polling with blocked signals.


The processes that was created yesterday via soft reload, went 100% cpu
today.

haproxy  29388  5.0  0.0  58772 11700 ?Rs   Apr17 156:44
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf
1997

Section from strace output:


(...)

this below :


epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0


is not good unfortunately. I'm assuming this is with 1.5.17, that's it ? If
so we still have an issue :-/


It is 1.6.3

# haproxy -vv
HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Veiko




Re: 100% cpu , epoll_wait()

2016-04-19 Thread Veiko Kukk

On 19/04/16 18:52, Willy Tarreau wrote:

On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote:
OK in fact it's different. Above we have a busy polling loop, which may
very be caused by the buffer space miscalculation bug and which results
in a process not completing its job until a timeout strikes. The link to
the other report shows a normal polling with blocked signals.


The processes that was created yesterday via soft reload, went 100% cpu 
today.


haproxy  29388  5.0  0.0  58772 11700 ?Rs   Apr17 156:44 
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid 
-sf 1997


Section from strace output:

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(32, 
"\366\334\247\270<\230\3028\v\334\236K\204^p\31\6\3T\230:\23s\257\337\316\242\302]\2\246\227"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(32, 
"\366\334si\251\272Y\372\360'/\363\212\246\262w\307[\251\375\314\236whe\302\337\257\25NQ\370"..., 
1024, 0, NULL, NULL) = 1024
sendto(18, 
"\366\334\247\270<\230\3028\v\334\236K\204^p\31\6\3T\230:\23s\257\337\316\242\302]\2\246\227"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 8016
sendto(18, 
"\355\265\207\360\357\3046k\364\320\330\30d\247\354\273BE\201\337\4\265#\357Z\231\231\337\365*\242\345"..., 
8376, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(0, EPOLL_CTL_MOD, 18, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=18, 
u64=18}}) = 0

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(32, 
"@OR\224\335\233\263\347U\245X\376)\240\342\334\242\31\321\322\354\222\276\233\247\316-\263\370)\252U"..., 
8016, 0, NULL, NULL) = 8016

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {{EPOLLOUT, {u32=53, u64=53}}}, 200, 0) = 1
sendto(53, 
"\274'[\24\n\264*b\306\253YA\313A\36\202a\177\317\370K:\302\230\315.\315\215\f&\351\27"..., 
14032, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 14032
sendto(53, 
"\234CS\236wYsf\267\24\276v\325\302\267+a\303\336\250\211x\236\33\23MR_\324\214A\264"..., 
2360, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 2360
recvfrom(55, 
"\231\16\35\337\20\203V\344\360\202n\307\2120\213\r\353\312\334\357\205\366=\\\373|\210\4-\354\32\360"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"i\244\305N\242I\177n'4g\211\256%\26X\34il\3374\34HN\22\365\357\211Y\354\306K"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
"\231\16\35\337\20\203V\344\360\202n\307\2120\213\r\353\312\334\357\205\366=\\\373|\210\4-\354\32\360"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392

epoll_ctl(0, EPOLL_CTL_MOD, 53, {EPOLLIN|EPOLLRDHUP, {u32=53, u64=53}}) = 0
epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
"\365f\303r(\1\365S\276\246c\334\216\346\226\10<}\340\227h\374\370\360\276sSs\346\351\337\370"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"-\r\21\326\326\0\0>\346-?\375\325J\346N\336\353Jz\376\303\373?\226y}\317\257\371\304t"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
"\365f\303r(\1\365S\276\246c\334\216\346\226\10<}\340\227h\374\370\360\276sSs\346\351\337\370"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
"\251\3\0200\317\217ab\223\f\306\322/}J\231\4\3b\311h\220sq\220[\225\21\372\264Dv"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"\233.\20B\337\343\274\311\212\211\241\244\5\257\221w1{\253Kjh\23?w\357\365\377\335\261\3\215"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
"\251\3\0200\317\217ab\223\f\306\322/}J\231\4\3b\311h\220sq\220[\225\21\372\264Dv"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
",T\27\22\300\31\231t\207%j-\263}\344\25#\333\235\214*M\227\26\0215*_\312/@\351"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"\225\256\37Qib\371\377\220l\342\20\2742\271\3360U\224\0375?ju\10\207\235J\267\35\340\367"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
",T\27\22\300\31\231t\207%j-\263}\344\25#\333\235\214*M\227\26\0215*_\312/@\351"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 13312
sendto(53, 
"\372\265\334\263\232\2016l2\216\372\261B\26\243\252\204\220\353\f\367\215\331\232\203hI,\260\37\207\357"..., 
3080, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(0, EPOLL_CTL_MOD, 53, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=53, 
u64=53}}) = 0

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
"k\33\342U\260:Z\350\3725>\211R@\20\347\326\363\203\36?\226\304\241\367\263B\242\230\6^\221"..., 
13312, 0, NULL, NULL) = 13312

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0

Re: 100% cpu , epoll_wait()

2016-04-19 Thread Veiko Kukk


On 16/04/16 01:53, Jim Freeman wrote:

I'm suspecting that a connection to the stats port goes wonky with a
'-sf' reload, but I'll have to wait for it to re-appear to poke
further.  I'll look first for a stats port connection handled by the
pegged process, then use 'tcpkill' to kill just that connection
(rather than the whole process, which may be handling other
connections).


We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar 
situation after some reloads (-sf). The old haproxy process does not 
exit and uses 100% cpu, strace showing:

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0

In our case, it was a tcp backend tunnelling rsyslog messages. After 
restarting local rsyslogd, the load was gone and old haproxy instance 
exited. It's hard to tell how many reloads it takes to make haproxy go 
crazy or what is the exact reproducible test. But it does not take 
hundreds of restart, rather 10-20 (our reloads are not very frequent) to 
make haproxy go crazy.


$ haproxy -vv
HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Best regards,
Veiko



Choosing backend based on constant

2015-04-30 Thread Veiko Kukk

Hi everybody

I'd like to simplify my haproxy configuration management by using almost 
identical configurations for different groups of haproxy installations 
that use different backends based on string comparision. The only 
difference in haproxy configuration files of different groups would be 
that string.


The configuration logic would be something like this (not syntactically 
correct for haproxy, I know, but should show what I wish to accomplish):


constant = foo # first hostgroup configuration
constant = bar # second hostgroup configuration

# common configuration for all hostgroups
use_backend ha_backend_foo if constant == foo
use_backend ha_backend_bar if constant == bar
...

I wonder how to specify that string and form acl to use in 'use_backend' 
statement?


Thanks in advance,
Veiko







Re: Choosing backend based on constant

2015-04-30 Thread Veiko Kukk
I'd like to manually add that constant string into configuration, not to 
get it from the traffic. It would help to reduce differences in haproxy 
configuration file between server groups and easier migration between 
groups.


Best regards,
Veiko

On 30/04/15 18:06, Baptiste wrote:

On Thu, Apr 30, 2015 at 11:49 AM, Veiko Kukk vk...@xvidservices.com wrote:

Hi everybody

I'd like to simplify my haproxy configuration management by using almost
identical configurations for different groups of haproxy installations that
use different backends based on string comparision. The only difference in
haproxy configuration files of different groups would be that string.

The configuration logic would be something like this (not syntactically
correct for haproxy, I know, but should show what I wish to accomplish):

constant = foo # first hostgroup configuration
constant = bar # second hostgroup configuration

# common configuration for all hostgroups
use_backend ha_backend_foo if constant == foo
use_backend ha_backend_bar if constant == bar
...

I wonder how to specify that string and form acl to use in 'use_backend'
statement?

Thanks in advance,
Veiko



Hi Veiko,

The question is how do you set your constant, what piece of
information do you use from the traffic or whatever?
Then we may help you.

Baptiste