Re: 100% cpu , epoll_wait()

2016-04-19 Thread Veiko Kukk


On 16/04/16 01:53, Jim Freeman wrote:

I'm suspecting that a connection to the stats port goes wonky with a
'-sf' reload, but I'll have to wait for it to re-appear to poke
further.  I'll look first for a stats port connection handled by the
pegged process, then use 'tcpkill' to kill just that connection
(rather than the whole process, which may be handling other
connections).


We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar 
situation after some reloads (-sf). The old haproxy process does not 
exit and uses 100% cpu, strace showing:

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0

In our case, it was a tcp backend tunnelling rsyslog messages. After 
restarting local rsyslogd, the load was gone and old haproxy instance 
exited. It's hard to tell how many reloads it takes to make haproxy go 
crazy or what is the exact reproducible test. But it does not take 
hundreds of restart, rather 10-20 (our reloads are not very frequent) to 
make haproxy go crazy.


$ haproxy -vv
HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing 
-Wdeclaration-after-statement

  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT 
IPV6_TRANSPARENT IP_FREEBIND


Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Best regards,
Veiko



Re: 100% cpu , epoll_wait()

2016-04-19 Thread Lukas Tribus

Hi,


Am 19.04.2016 um 09:39 schrieb Veiko Kukk:



We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar 
situation after some reloads (-sf). The old haproxy process does not 
exit and uses 100% cpu, strace showing:

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0

In our case, it was a tcp backend tunnelling rsyslog messages. After 
restarting local rsyslogd, the load was gone and old haproxy instance 
exited. It's hard to tell how many reloads it takes to make haproxy go 
crazy or what is the exact reproducible test. But it does not take 
hundreds of restart, rather 10-20 (our reloads are not very frequent) 
to make haproxy go crazy.


Also matches this report from December:
https://www.mail-archive.com/haproxy@formilux.org/msg20772.html



Lukas



Re: HAProxy rejecting requests w/ extended characters in their URLs as bad

2016-04-19 Thread CJ Ess
That will work for now, in the future it wold be nice to have an option to
allow non-control utf-8 characters in the URI without enabling all of the
other stuff.


On Mon, Apr 18, 2016 at 4:59 PM, PiBa-NL  wrote:

> Op 18-4-2016 om 22:47 schreef CJ Ess:
>
> This is using HAProxy 1.5.12 - I've noticed an issue where HAProxy is
>> sometimes rejecting requests with a 400 code when the URL string contains
>> extended characters. Nginx is fronting HAProxy and has passed them through
>> as as valid requests and just eyeballing them they look ok to me.
>>
>> An example is a german URL with 0xc3 0x95 contained in the URL
>>
>> A second example is a latin URL with 0xc3 0xa7 contained in the URL
>>
>> A third example is an asian URL with 0xe6 0xac 0xa1 0xe3 contained in the
>> URL (and many more so I may or may not have complete characters in the
>> example)
>>
>> I don't know the encoding these characters are part of, there are no
>> hints in the other headers.
>>
>> Any idea what I can do to have haproxy accept these?
>>
>> Have you tried?:
> http://cbonte.github.io/haproxy-dconv/snapshot/configuration-1.6.html#4.2-option%20accept-invalid-http-request
> Though technically the requests are invalid, and should be fixed/avoided
> if possible
>


Re: HAProxy rejecting requests w/ extended characters in their URLs as bad

2016-04-19 Thread Lukas Tribus

Hi,


Am 19.04.2016 um 15:10 schrieb CJ Ess:
That will work for now, in the future it wold be nice to have an 
option to allow non-control utf-8 characters in the URI without 
enabling all of the other stuff.


Thats exactly what this option already does.

There is fortunately no way in haproxy to allow control characters in 
those headers.




Lukas



Re: 100% cpu , epoll_wait()

2016-04-19 Thread Willy Tarreau
Hi guys,

On Tue, Apr 19, 2016 at 02:54:35PM +0200, Lukas Tribus wrote:
> >We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar situation
> >after some reloads (-sf). The old haproxy process does not exit and uses
> >100% cpu, strace showing:
> >epoll_wait(0, {}, 200, 0)   = 0
> >epoll_wait(0, {}, 200, 0)   = 0
> >epoll_wait(0, {}, 200, 0)   = 0
> >epoll_wait(0, {}, 200, 0)   = 0
> >epoll_wait(0, {}, 200, 0)   = 0
> >epoll_wait(0, {}, 200, 0)   = 0
> >
> >In our case, it was a tcp backend tunnelling rsyslog messages. After
> >restarting local rsyslogd, the load was gone and old haproxy instance
> >exited. It's hard to tell how many reloads it takes to make haproxy go
> >crazy or what is the exact reproducible test. But it does not take
> >hundreds of restart, rather 10-20 (our reloads are not very frequent) to
> >make haproxy go crazy.
> 
> Also matches this report from December:
> https://www.mail-archive.com/haproxy@formilux.org/msg20772.html

Yep very likely. The combination of the two reports is very intriguing.
The first one shows the signals being blocked, while the only place where
we block them is in __signal_process_queue() only while calling the handlers
or performing the wakeup() calls, both of which should be instantaneous,
and more importantly the function cannot return without unblocking the
signals.

I still have no idea what is going on, the code looks simple and clear,
and certainly not compatible with such behaviours. I'm still digging.

Willy




Re: 100% cpu , epoll_wait()

2016-04-19 Thread Willy Tarreau
On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote:
> On Tue, Apr 19, 2016 at 02:54:35PM +0200, Lukas Tribus wrote:
> > >We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar situation
> > >after some reloads (-sf). The old haproxy process does not exit and uses
> > >100% cpu, strace showing:
> > >epoll_wait(0, {}, 200, 0)   = 0
> > >epoll_wait(0, {}, 200, 0)   = 0
> > >epoll_wait(0, {}, 200, 0)   = 0
> > >epoll_wait(0, {}, 200, 0)   = 0
> > >epoll_wait(0, {}, 200, 0)   = 0
> > >epoll_wait(0, {}, 200, 0)   = 0
> > >
> > >In our case, it was a tcp backend tunnelling rsyslog messages. After
> > >restarting local rsyslogd, the load was gone and old haproxy instance
> > >exited. It's hard to tell how many reloads it takes to make haproxy go
> > >crazy or what is the exact reproducible test. But it does not take
> > >hundreds of restart, rather 10-20 (our reloads are not very frequent) to
> > >make haproxy go crazy.
> > 
> > Also matches this report from December:
> > https://www.mail-archive.com/haproxy@formilux.org/msg20772.html
> 
> Yep very likely. The combination of the two reports is very intriguing.
> The first one shows the signals being blocked, while the only place where
> we block them is in __signal_process_queue() only while calling the handlers
> or performing the wakeup() calls, both of which should be instantaneous,
> and more importantly the function cannot return without unblocking the
> signals.
> 
> I still have no idea what is going on, the code looks simple and clear,
> and certainly not compatible with such behaviours. I'm still digging.

OK in fact it's different. Above we have a busy polling loop, which may
very be caused by the buffer space miscalculation bug and which results
in a process not completing its job until a timeout strikes. The link to
the other report shows a normal polling with blocked signals.

Willy




Re: [PATCH] use SSL_CTX_set_ecdh_auto() for ecdh curve selection

2016-04-19 Thread Emeric Brun
On 04/18/2016 11:23 PM, David Martin wrote:
> On Mon, Apr 18, 2016 at 3:02 PM, Janusz Dziemidowicz
>  wrote:
>> 2016-04-15 16:50 GMT+02:00 David Martin :
>>> I have tested the current patch with the HAProxy default, a list of curves,
>>> a single curve and also an incorrect curve.  All seem to behave correctly.
>>> The conditional should only skip calling ecdh_auto() if curves_list()
>>> returns 0 in which case HAProxy exits anyway.
>>>
>>> Maybe I'm missing something obvious, this has been a learning experience for
>>> me.
>>
>> You are correct. I guess I shouldn't have been looking at patches
>> during a break at a day work;)
>> Seems ok for me now. Apart from the missing documentation changes;)
>>
>> --
>> Janusz Dziemidowicz
> 
> Added doc changes :)
> 

Hi All,

I don't know how the curve negotiation works, but i have some questions.

What is the behavior if the SSL_CTX_set_ecdh_auto is used on server side and if
the client doesn't support the neg.

In other words:

Is it useful to set both SSL_CTX_set_ecdh_auto and SSL_CTX_set_tmp_ecdh (with 
the first one of the list for instance), to ensure 
the first wanted curve is used if client doesn't support the neg.

R,
Emeric




Re: [PATCH 1/2] MINOR: Add ability for agent-check to set server maxconn

2016-04-19 Thread Willy Tarreau
Hi Nenad,

On Sun, Apr 17, 2016 at 12:05:04AM +0200, Nenad Merdanovic wrote:
> This is very useful in complex architecture systems where HAproxy
> is balancing DB connections for example. We want to keep the maxconn
> high in order to avoid issues with queueing on the LB level when
> there is slowness on another part of the system. Example is a case of
> an architecture where each thread opens multiple DB connections, which
> if get stuck in queue cause a snowball effect (old connections aren't
> closed, new ones cannot be established). These connections are mostly
> idle and the DB server has no problem handling thousands of them.
> 
> Allowing us to dynamically set maxconn depending on the backend usage
> (LA, CPU, memory, etc.) enables us to have high maxconn for situations
> like above, but lowering it in case there are real issues where the
> backend servers become overloaded (cache issues, DB gets hit hard).
> ---
>  doc/configuration.txt  |  3 +++
>  include/proto/server.h |  8 
>  src/checks.c   | 18 +-
>  src/server.c   | 27 +++
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index c705a09..640c0f3 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -10146,6 +10146,9 @@ agent-check
>  weight is reported on the stats page as "DRAIN" since it has the same
>  effect on the server (it's removed from the LB farm).
>  
> +  - An ASCII representation of a positive integer, followed by a single 
> letter
> +'m'. Values in this format will set the maxconn of a server.
> +

Your patch looks fine but I'm a bit bothered by the choice of the syntax
here which is neither really intuitive nor future-proof. I even suspect
you had some head-scratching before coming to this.

At least I'd have found it more natural to use "10c" than "10m" to specific
a connection limit but anyway that's still something we might regret over
the long term. Thus, what do you think about using a completely different
syntax such as "maxconn:10" or "maxconn=10" ? It would allow to seamlessly
extend the language without breaking compatibility with existing products.

On a side note, there is something important to mention in the documentation
which is a side effect of doing this that most people do not realize. It
currently affects protocols like ICAP. It's the fact that when a server
advertises a maxconn to all of its clients (here the load balancers), it
ends up with N times the expected maxconn.

Thus I think we must make it very clear that the advertised value must
absolutely be understood as *per load balancer*. Maybe in the future we'll
want to support different words depending on whether we advertise the
total maxconn the server supports or the per-client one (for when front
LBs know how many they are).

Thanks,
Willy




Re: Coding style for coonfig files

2016-04-19 Thread Willy Tarreau
Hello Michael,

On Fri, Apr 15, 2016 at 11:39:35PM +0200, Michael Rennecke wrote:
> Hello,
> 
> I know this question is stupid. Is there a coding style for config
> files, like this: http://www.haproxy.org/coding-style.html ?

No and that could be a very good idea. In general everyone tends to
adopt the format I intially used consisting in having section keywords
on the left, and indenting everything else using spaces or tabs so that
sections are clearly visible.

A style here would only be a recommendation to help people write
readable files, but everyone is obviously free to write as he wants
if he maintains his own files!

Cheers,
Willy




Re: [PATCH] use SSL_CTX_set_ecdh_auto() for ecdh curve selection

2016-04-19 Thread Janusz Dziemidowicz
2016-04-19 18:13 GMT+02:00 Emeric Brun :
> I don't know how the curve negotiation works, but i have some questions.
>
> What is the behavior if the SSL_CTX_set_ecdh_auto is used on server side and 
> if
> the client doesn't support the neg.
>
> In other words:
>
> Is it useful to set both SSL_CTX_set_ecdh_auto and SSL_CTX_set_tmp_ecdh (with 
> the first one of the list for instance), to ensure
> the first wanted curve is used if client doesn't support the neg.

Not really. In TLS protocol, there is only one way for a client t
select elliptic curve, that is using "supported eliptic curves"
extensions. The confusing part is OpenSSL API. The "old" API, aka
SSL_CTX_set_tmp_ecdh(), allowed only curve to be selected by the
server. If it was not present on the extension sent by client, then
bummer, connection error. The new API "SSL_CTX_set_ecdh_auto" supports
real negotiation, as it was always in the design of TLS. Client sends
its curves list in the extension, server tries to find a matching
curve from a list it supports.

There are no clients "not supporting the neg". If the client supports
elliptic curves at all it must send the list in the extension.

-- 
Janusz Dziemidowicz



Re: 100% cpu , epoll_wait()

2016-04-19 Thread Veiko Kukk

On 19/04/16 18:52, Willy Tarreau wrote:

On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote:
OK in fact it's different. Above we have a busy polling loop, which may
very be caused by the buffer space miscalculation bug and which results
in a process not completing its job until a timeout strikes. The link to
the other report shows a normal polling with blocked signals.


The processes that was created yesterday via soft reload, went 100% cpu 
today.


haproxy  29388  5.0  0.0  58772 11700 ?Rs   Apr17 156:44 
/usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid 
-sf 1997


Section from strace output:

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(32, 
"\366\334\247\270<\230\3028\v\334\236K\204^p\31\6\3T\230:\23s\257\337\316\242\302]\2\246\227"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(32, 
"\366\334si\251\272Y\372\360'/\363\212\246\262w\307[\251\375\314\236whe\302\337\257\25NQ\370"..., 
1024, 0, NULL, NULL) = 1024
sendto(18, 
"\366\334\247\270<\230\3028\v\334\236K\204^p\31\6\3T\230:\23s\257\337\316\242\302]\2\246\227"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 8016
sendto(18, 
"\355\265\207\360\357\3046k\364\320\330\30d\247\354\273BE\201\337\4\265#\357Z\231\231\337\365*\242\345"..., 
8376, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(0, EPOLL_CTL_MOD, 18, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=18, 
u64=18}}) = 0

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(32, 
"@OR\224\335\233\263\347U\245X\376)\240\342\334\242\31\321\322\354\222\276\233\247\316-\263\370)\252U"..., 
8016, 0, NULL, NULL) = 8016

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {{EPOLLOUT, {u32=53, u64=53}}}, 200, 0) = 1
sendto(53, 
"\274'[\24\n\264*b\306\253YA\313A\36\202a\177\317\370K:\302\230\315.\315\215\f&\351\27"..., 
14032, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 14032
sendto(53, 
"\234CS\236wYsf\267\24\276v\325\302\267+a\303\336\250\211x\236\33\23MR_\324\214A\264"..., 
2360, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 2360
recvfrom(55, 
"\231\16\35\337\20\203V\344\360\202n\307\2120\213\r\353\312\334\357\205\366=\\\373|\210\4-\354\32\360"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"i\244\305N\242I\177n'4g\211\256%\26X\34il\3374\34HN\22\365\357\211Y\354\306K"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
"\231\16\35\337\20\203V\344\360\202n\307\2120\213\r\353\312\334\357\205\366=\\\373|\210\4-\354\32\360"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392

epoll_ctl(0, EPOLL_CTL_MOD, 53, {EPOLLIN|EPOLLRDHUP, {u32=53, u64=53}}) = 0
epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
"\365f\303r(\1\365S\276\246c\334\216\346\226\10<}\340\227h\374\370\360\276sSs\346\351\337\370"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"-\r\21\326\326\0\0>\346-?\375\325J\346N\336\353Jz\376\303\373?\226y}\317\257\371\304t"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
"\365f\303r(\1\365S\276\246c\334\216\346\226\10<}\340\227h\374\370\360\276sSs\346\351\337\370"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
"\251\3\0200\317\217ab\223\f\306\322/}J\231\4\3b\311h\220sq\220[\225\21\372\264Dv"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"\233.\20B\337\343\274\311\212\211\241\244\5\257\221w1{\253Kjh\23?w\357\365\377\335\261\3\215"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
"\251\3\0200\317\217ab\223\f\306\322/}J\231\4\3b\311h\220sq\220[\225\21\372\264Dv"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
",T\27\22\300\31\231t\207%j-\263}\344\25#\333\235\214*M\227\26\0215*_\312/@\351"..., 
15368, 0, NULL, NULL) = 15368
recvfrom(55, 
"\225\256\37Qib\371\377\220l\342\20\2742\271\3360U\224\0375?ju\10\207\235J\267\35\340\367"..., 
1024, 0, NULL, NULL) = 1024
sendto(53, 
",T\27\22\300\31\231t\207%j-\263}\344\25#\333\235\214*M\227\26\0215*_\312/@\351"..., 
16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 13312
sendto(53, 
"\372\265\334\263\232\2016l2\216\372\261B\26\243\252\204\220\353\f\367\215\331\232\203hI,\260\37\207\357"..., 
3080, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
epoll_ctl(0, EPOLL_CTL_MOD, 53, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=53, 
u64=53}}) = 0

epoll_wait(0, {}, 200, 0)   = 0
recvfrom(55, 
"k\33\342U\260:Z\350\3725>\211R@\20\347\326\363\203\36?\226\304\241\367\263B\242\230\6^\221"..., 
13312, 0, NULL, NULL) = 13312

epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_wait(0, {}, 200, 0)   = 0
epoll_w