Re: reg-test failures on FreeBSD, how to best adapt/skip some tests?

2018-10-01 Thread PiBa-NL

Hi Frederic,

Op 1-10-2018 om 16:09 schreef Frederic Lecaille:

- /connection/b0.vtc
probably does not 'really' need abns@ sockets, so changing to unix@ 
would make it testable on more platforms?

Correct. I agree I did not think to replace this part specific to Linux.

Should i send a patch changing that? Or can you fix it easily ;)



- /log/b0.vtc
Not exactly sure why this fails/why it was supposed to work.
It either produces a timeout, or the s1 server fails to read the 
request which the tcp-healthcheck does not send..


 ***  s1    0.0 accepted fd 5 127.0.0.1 23986
 **   s1    0.0 === rxreq
  s1    0.0 HTTP rx failed (fd:5 read: Connection reset by peer)


Perhaps the syslog traces could give us more information about what is 
happening here.
Afaik the 'check' on the server line is failing to send a GET / 
request.. My other mail has a bit more ideas about that..



- /seamless-reload/b0.vtc
This one specifically mentions testing a abns@ socket functionality. 
so changing it to a unix@ socket likely changes the test in such a 
way its no longer testing what it was meant for..

What would be the best way to skip this test on FreeBSD?


Perhaps we should use the TARGET value to select the VTC files 
directories which should be selected for the OSes.


By default for linux all VTC files in reg-tests directory should be 
run (found with find command without -L option, so that not to follow 
the symbolic link).


For instance for freebsd OS we would create reg-tests/freebsd directory
with symbolic links to the linux reg-tests subdirectories it supports.
I think creating a list of tests that could be run on FreeBSD will take 
a lot of maintenance and i assume 'most' tests will actually be runnable 
on most OS's. So having short list with exceptions is probably easier to 
maintain. In my opinion every test should be run on every os unless 
there is some good reason not to run it (abns/splicing/stuff..). And if 
possible the reason to exclude a specific test should be described (a 
single line of text could be enough).
The even bigger issue that Willy raised is with compilation options not 
including lua/ssl/gzip/threads/ stuff, and having lots of tests fail on 
those, while they should get skipped if they rely on a feature not 
compiled. I discussed this also a bit with Willy in the other mail 
thread ( https://www.mail-archive.com/haproxy@formilux.org/msg31345.html 
), but i don't think we have defined the perfect way to do it yet..


So that part of the question still stands ;) .. Whats the best way to 
skip tests that are not applicable?


Regards,
PiBa-NL (Pieter)





Re: reg-test failures on FreeBSD, how to best adapt/skip some tests?

2018-10-01 Thread PiBa-NL

Hi Frederic,

Op 1-10-2018 om 16:19 schreef Frederic Lecaille:

On 09/11/2018 04:51 PM, PiBa-NL wrote:

Hi List,

I was wondering how to best run the reg-tests that are 'valid' for 
FreeBSD.


There are a 2 tests that use abns@ sockets, which seem not available 
on FreeBSD.
Also 1 test is failing for a reason i'm not totally sure if its 
totally expected to or not..


- /connection/b0.vtc
probably does not 'really' need abns@ sockets, so changing to unix@ 
would make it testable on more platforms?


- /log/b0.vtc
Not exactly sure why this fails/why it was supposed to work.
It either produces a timeout, or the s1 server fails to read the 
request which the tcp-healthcheck does not send..


 ***  s1    0.0 accepted fd 5 127.0.0.1 23986
 **   s1    0.0 === rxreq
  s1    0.0 HTTP rx failed (fd:5 read: Connection reset by peer)


This test relies on timeout values to produce the correct syslog output.
Relying on timing values for reg testing is perhaps a bad idea as the 
results are not deterministic.


Indeed here we want the 5ms timeout client to expire before it reads 
the server response after having waited for 20ms.


I propose to remove this test.


Regarding /log/b0.vtc  i think the test itself is 'mostly' okay.

If we keep the check i do think its 'check' on the server line should be 
changed by one of the following options:

- removing 'check'
- check to different port to a s2 server that does not use txresp
- add 'option httpchk' and repeat s1 twice..
Any of the changes above would 'fix' the biggest issue with the test as 
far as i see..


But indeed having short timeouts could cause on occasion cause random 
failures if for example multiple tests are run simultaneously..


The test for a decent log line with a 'client timout' flag is probably 
something that should be tested, perhaps the c1 delay could be set to 
something longer to 'enforce' the 'timeout client 5' would always hit 
first even when stuff starts competing for (cpu) resources.?


Regards,
PiBa-NL (Pieter)



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread PiBa-NL

Hi Willy,
Op 30-9-2018 om 20:38 schreef Willy Tarreau:

On Sun, Sep 30, 2018 at 08:22:23PM +0200, Willy Tarreau wrote:

On Sun, Sep 30, 2018 at 07:59:34PM +0200, PiBa-NL wrote:

Indeed it works with 1.8, so in that regard i 'think' the test itself is
correct.. Also when disabling threads, or running only 1 client, it still
works.. Then both CumConns CumReq show 11 for the first stats result.

Hmmm for me it fails even without threads. That was the first thing I
tried when meeting the error in fact. But I need to dig deeper.

So I'm seeing that in fact the count is correct if the server connection
closes first, and wrong otherwise. In fact it fails similarly both for
1.6, 1.7, 1.8 and 1.9 with and without threads. I'm seeing that the
connection count is exactly 10 times the incoming connections while the
request count is exactly 20 times this count. I suspect that what happens
is that the request count is increased on each connection when preparing
to receive a new request. This even slightly reminds me something but
I don't know where I noticed something like this, I think I saw this
when reviewing the changes needed to be made to HTTP for the native
internal representation.

So I think it's a minor bug, but not a regression.

Thanks,
Willy


Not sure, only difference between 100x FAILED and 100x OK is the version 
here. Command executed and result below.


Perhaps that's just because of the OS / Scheduler used though, i assume 
your using some linux distro to test with, perhaps that explains part of 
the differences between your and my results.. In the end it doesn't 
matter much if its a bug or a regression still needs a fix ;). And well 
i don't know if its just the counter thats wrong, or there might be 
bigger consequences somewhere. if its just the counter then i guess it 
wouldn't hurt much to postpone a fix to a next (dev?) version.


Regards,

PiBa-NL (Pieter)

root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 
16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc

...
#    top  TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc 
FAILED (0.128) exit=2
#    top  TEST ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc 
FAILED (0.135) exit=2

100 tests failed, 0 tests skipped, 0 tests passed
root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
Copyright 2000-2018 Willy Tarreau 

root@freebsd11:/usr/ports/net/haproxy-devel # pkg add -f 
haproxy-1.8.14-selfbuild-reg-tests-OK.txz

Installing haproxy-1.8...
package haproxy is already installed, forced install
Extracting haproxy-1.8: 100%
root@freebsd11:/usr/ports/net/haproxy-devel # varnishtest -q -n 100 -j 
16 -k ./haproxy_test_OK_20180831/loadtest/b0-loadtest.vtc

0 tests failed, 0 tests skipped, 100 tests passed
root@freebsd11:/usr/ports/net/haproxy-devel # haproxy -v
HA-Proxy version 1.8.14-52e4d43 2018/09/20
Copyright 2000-2018 Willy Tarreau 






Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread PiBa-NL

Hi Willy,

Op 30-9-2018 om 7:46 schreef Willy Tarreau:

Hi Pieter,

On Sun, Sep 30, 2018 at 12:05:14AM +0200, PiBa-NL wrote:

Hi Willy,

I thought lets give those reg-test another try :) as its easy to run and
dev3 just came out.
All tests pass on my FreeBSD system, except this one, new reg-test attached.

Pretty much the same test as previously send, but now with only 4 x 10
connections. Which should be fine for conntrack and sysctls (i hope..). It
seems those stats numbers are 'off', or is my expected value not as fixed as
i thought it would be?

Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse.
Indeed it works with 1.8, so in that regard i 'think' the test itself is 
correct.. Also when disabling threads, or running only 1 client, it 
still works.. Then both CumConns CumReq show 11 for the first stats result.

However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!
I'm not fully sure i agree on that.. While i understand that failing 
reg-tests can be a pita while developing (if you run them regulary) the 
fact is that currently existing tests can already already start to fail 
after some major redesign of the code, a few mails back (different 
mailthread) i tested like 10 commits in a row and they all suffered from 
different failing tests, that would imho not be a reason to remove those 
tests, and they didnt stop development.

I'm also seeing that you rely on threads, I think I noticed another test
involving threads. Probably that we should have a specific directory for
these ones that we can disable completely when threads are not enabled,
otherwise this will also destroy tests (and make them extremely slow due
to varnishtest waiting for the timeout if haproxy refuses to parse the
config).
A specific directory will imho not work. How should it be called? 
/threaded_lua_with_ssl_using_kqueue_scheduler_on_freebsd_without_absn_for_haproxy_1.9_and_higher/ 
?
Having varnishtest fail while waiting for a feature that was not 
compiled is indeed undesirable as well. So some 'smart' way of defining 
'requirements' for a test will be needed so they can gracefully skip if 
not applicable.. I'm not sure myself how that way should look though.. 
On one side i think the .vtc itself might be the place to define what 
requirements it has, on the other the other a separate list/script 
including logic of what tests to run could be nice.. But then who is 
going to maintain that one..

I think that we should think a bit forward based on these tests. We must
not let varnishtest stop on the first error but rather just log it.

varnishtest can continue on error with -k
Using this little mytest.sh script at the moment, this runs all tests 
and only failed tests produce a lot of logging..:

  haproxy -v
  varnishtest -j 16 -k -t 20 ./work/haproxy-*/reg-tests/*/*.vtc > 
./mytest-result.log 2>&1
  varnishtest -j 16 -k -t 20 ./haproxy_test_OK_20180831/*/*.vtc >> 
./mytest-result.log 2>&1

  cat ./mytest-result.log
  echo "" >> ./mytest-result.log
  haproxy -vv  >> ./mytest-result.log

There is also the -q parameter, but then it doesn't log anymore what 
tests passed and would only the failed tests will produce 1 log line.. 
(i do like to log what tests where executed though..)

  Then
at the end we could produce a report of successes and failures that would
be easy to diff from the previous (or expected) one. That will be
particularly useful when running the tests on older releases. As an
example, I had to run your test manually on 1.8 because for I-don't-know-
what-reason, the one about the proxy protocol now fails while it used to
work fine last week for the 1.8.14 release. That's a shame that we can't
complete tests just because one randomly fails.
You can continue tests. ( -k ) But better write it out to a logfile 
then, or perhaps combine with -l which leaves the /tmp/.vtc folder..

Thanks,
Willy


Regards,
PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-30 Thread PiBa-NL

Hi Willy,

Op 30-9-2018 om 7:56 schreef Willy Tarreau:

On Sun, Sep 30, 2018 at 07:46:24AM +0200, Willy Tarreau wrote:

Well, at least it works fine on 1.8 and not on 1.9-dev3 so I think you
spotted a regression that we have to analyse. However, I'd like to merge
the fix before merging the regtest otherwise it will kill the reg-test
feature until we manage to get the issue fixed!

By the way, could you please explain in simple words the issue you've
noticed ? I tried to reverse the vtc file but I don't understand the
details nor what it tries to achieve. When I'm running a simple test
on a simple config, the CummConns always matches the CumReq, and when
running this test I'm seeing random values there in the output, but I
also see that they are retrieved before all connections are closed

But CurrConns is 0, so connections are (supposed to be?) closed? :

 h1    0.0 CLI recv|CurrConns: 0
 h1    0.0 CLI recv|CumConns: 27
 h1    0.0 CLI recv|CumReq: 27


, so
I'm not even sure the test is correct :-/

Thanks,
Willy


What i'm trying to achieve is, well.. testing for regressions that are 
not yet known to exist on the current stable version.


So what this test does in short:
It makes 4 clients simultaneously send a request to a threaded haproxy, 
which in turn connects 10x backend to frontend and then sends the 
request to the s1 server. This with the intended purpose of having 
several connections started and broken up as fast as haproxy can process 
them while trying to have a high probability of adding/removing items 
from lists/counters from different threads thus possibly creating 
problems if some lock/sync isn't done correctly. After firing a few 
requests it also verifies the expected counts, and results where possible..


History:
Ive been bit a few times with older releases by corruption occurring 
inside the POST data when uploading large (500MB+) files to a server 
behind haproxy. After a few megabytes are passed correctly the resulting 
file would contain differences from their original when compared, the 
upload 'seemed' to succeed though. (this was then solved by installing a 
newer haproxy build..).. Also sometimes threads have locked up or 
crashed things. Or kqueue scheduler turned out to behave differently 
than others.. Ive been trying to test such things manually but found i 
always forget to run some test. This is why i really like the concept of 
having a set of defined tests that validate haproxy is working 
'properly', on the OS i run it on.. Also when some issue i ran into gets 
fixed i tend to run -dev builds on my production environment for a 
while, and well its nice to know that other functionality still works as 
it used to..


With writing this test i initially started with the idea of 
automatically testing a large file transfer through haproxy, but then 
thought where / how to leave such a file, so i thought of transferring a 
'large' header with increasing size 'might' trigger a similar 
condition.. Though in hindsight that might not actually test the same 
code paths..


I created that test with 1 byte growth in the header together with 4000 
connections didn't quite achieve that initial big file simulation, but 
still i thought it ended up to be a nice test. So submitted it a while 
back ;) .. Anyhow haproxy wasn't capable of doing much when dev2 was 
tagged so i wasnt to worried the test failed at that time.. And you 
announced dev2 as such as well, so that was okay. And perhaps the issue 
found then would solve itself when further fixes on top of dev2 were 
added ;).


Anyhow with dev3 i hoped all regressions would be fixed, and found this 
one still failed on 1.9dev3. So it tuned the numbers in the previous 
submitted regtest down a little to avoid conntrack/sysctl default 
limits, while still failing the test 'reliably'.. I'm not sure what 
exactly is going on, or how bad it is that these numbers don't match up 
anymore.. Maybe its only the counter thats not updated in a thread safe 
way, perhaps there is a bigger issue lurking with sync points and 
whatnot..? Either way the test should pass as i understand it, the 4 
defined varnish clients got their answer back and Currconns = 0, also 
adding a 3 second delay between waiting for the clients and checking the 
stats does not fix it... And as youve checked with 1.8 it does pass. 
Though that to could perhaps be a coincidence, maybe now things are 
processed even faster now but in different order so the test fails for 
the wrong reason.?.


Hope that makes some sense in my thought process :).

Regards,

PiBa-NL (Pieter)




Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-29 Thread PiBa-NL

Hi Willy,

I thought lets give those reg-test another try :) as its easy to run and 
dev3 just came out.

All tests pass on my FreeBSD system, except this one, new reg-test attached.

Pretty much the same test as previously send, but now with only 4 x 10 
connections. Which should be fine for conntrack and sysctls (i hope..). 
It seems those stats numbers are 'off', or is my expected value not as 
fixed as i thought it would be?


Tested with:
HA-Proxy version 1.9-dev3-27010f0 2018/09/29
FreeBSD freebsd11 11.1-RELEASE

Results:
 h1    0.0 CLI recv|CumConns: 33
 h1    0.0 CLI recv|CumReq: 65
 h1    0.0 CLI expect failed ~ "CumConns: 41"

If my 'expect' is correct,  would the patch be suitable for inclusion 
with the other reg-tests this way?
If you want to rename loadtest, to heavytest, or any other tweaks please 
feel free to do so.


Regards,
PiBa-NL (Pieter)

Op 20-9-2018 om 22:25 schreef PiBa-NL:

Hi Willy,

Op 20-9-2018 om 13:56 schreef Willy Tarreau:

For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).

It's already far too much for a user.


Well those 345 lines are if it succeeds while in 'verbose' mode, in 
'normal' mode it only produces 1 line of output when successful. 
Pretty much all tests produce 100+ lines of 'logging' if they fail for 
some reason. From what ive seen varnishtest either produces a bulk of 
logging on a failure, or it only logs the failures. There isn't much 
in between.


As for all the rest of the email, thanks for your elaborate response :).

Regards,

PiBa-NL (Pieter)



From 28377ffe246ed1db0e0d898fa6263eccdc68c490 Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Sat, 15 Sep 2018 01:51:54 +0200
Subject: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
 after running a 4 x 10 looping requests

---
 reg-tests/loadtest/b0-loadtest.vtc | 99 ++
 1 file changed, 99 insertions(+)
 create mode 100644 reg-tests/loadtest/b0-loadtest.vtc

diff --git a/reg-tests/loadtest/b0-loadtest.vtc 
b/reg-tests/loadtest/b0-loadtest.vtc
new file mode 100644
index ..590924e1
--- /dev/null
+++ b/reg-tests/loadtest/b0-loadtest.vtc
@@ -0,0 +1,99 @@
+# Checks that request and connection counters are properly kept
+
+varnishtest "Connection counters check"
+feature ignore_unknown_macro
+
+server s1 {
+rxreq
+expect req.http.TESTsize == 10
+txresp
+} -repeat 4 -start
+
+syslog Slg_1 -level notice {
+recv
+} -repeat 15 -start
+
+haproxy h1 -W -conf {
+  global
+nbthread 3
+log ${Slg_1_addr}:${Slg_1_port} local0
+#nokqueue
+
+  defaults
+mode http
+option dontlog-normal
+log global
+option httplog
+timeout connect 3s
+timeout client  4s
+timeout server  15s
+
+  frontend fe1
+bind "fd@${fe_1}"
+acl donelooping hdr(TEST) -m len 10
+http-request set-header TEST "%[hdr(TEST)]x"
+use_backend b2 if donelooping
+default_backend b1
+
+  backend b1
+server srv1 ${h1_fe_1_addr}:${h1_fe_1_port}
+
+  frontend fe2
+bind "fd@${fe_2}"
+default_backend b2
+
+  backend b2
+# haproxy 1.8 does not have the ,length converter.
+#acl OK hdr(TEST) -m len 10
+#http-request deny deny_status 200 if OK
+#http-request deny deny_status 400
+
+# haproxy 1.9 does have a ,length converter.
+http-request set-header TESTsize "%[hdr(TEST),length]"
+http-request del-header TEST
+server srv2 ${s1_addr}:${s1_port}
+
+} -start
+
+barrier b1 cond 4
+
+client c1 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c2 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c3 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c4 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+
+client c1 -wait
+client c2 -wait
+client c3 -wait
+client c4 -wait
+
+haproxy h1 -cli {
+send "show info"
+expect ~ "CumConns: 41"
+send "show info"
+expect ~ "CumReq: 42"
+}
-- 
2.18.0.windows.1



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-20 Thread PiBa-NL

Hi Willy,

Op 20-9-2018 om 13:56 schreef Willy Tarreau:

For me the test produces like 345 lines of output as attached. which seems
not to bad (if the test succeeds).

It's already far too much for a user.


Well those 345 lines are if it succeeds while in 'verbose' mode, in 
'normal' mode it only produces 1 line of output when successful. Pretty 
much all tests produce 100+ lines of 'logging' if they fail for some 
reason. From what ive seen varnishtest either produces a bulk of logging 
on a failure, or it only logs the failures. There isn't much in between.


As for all the rest of the email, thanks for your elaborate response :).

Regards,

PiBa-NL (Pieter)



Re: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-19 Thread PiBa-NL

Hi Willy,

Op 19-9-2018 om 7:36 schreef Willy Tarreau:

Hi Pieter,

I took some time this morning to give it a test. For now it fails here,
after dumping 2200 lines of not really usable output that I didn't
investigate. From what I'm seeing it seems to moderately stress the
local machine so it has many reasons for failing (lack of source
ports, improperly tuned conntrack, ulimit, etc), and it takes far too
long a time to be usable as a default test, or this one alone will be
enough to discourage anyone from regularly running "make reg-tests".
Test takes like 5 seconds to run here, and while that is a bit long if 
you get a hundred more similar tests and want to continue tweaking 
developments while running tests in between. It wouldn't hurt to run 
such a (series) of longer tests before creating a patch and submitting 
it for inclusion on the official git repository in my opinion, or before 
a release.?. My attempt was to test a bit differently than just looking 
for regressions of known fixed bugs, and putting a little load on 
haproxy so that threads and simultaneous actions 'might' get into 
conflicts/locks/stuff which might by chance, show up, which is why i 
choose to go a little higher on the number of round-trips with ever 
slightly increasing payload..


For me the test produces like 345 lines of output as attached. which 
seems not to bad (if the test succeeds).. Besides the 2 instances of cli 
output for stats, its seems not that much different from other tests..
And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST 
./test/b0-loadtest.vtc passed (4.800


Taking into account conntrack and ulimit, would that mean we can never 
'reg-test' if haproxy can really handle like 1 connections without 
issue? Or should the environment be configured by the test?? ,that seems 
very tricky at least and probably would be undesirable.. (i just today 
figured i could run reg-tests also on my production box to compare if a 
new build showed issues there that my test-box might not.. i wouldn't 
want system settings to changed by a reg-test run..)



I think we should create a distinct category for such tests
Agreed, which is why i used the currently non-existing '/loadtest/' 
folder. If '/heavy/' is better thats of course alright for me to.

, because
I see some value in it when it's made to reproduce a very specific
class of issues which is very unlikely to trigger unless someone is
working on it. In this case it is not a problem that it dumps a lot
of output, as it will be useful for the person knowing what to look
for there. Probably that such tests should be run by hand and dump
their log into a related file. Imagine for example that we would
have this :

  $ make reg-tests/heavy/conn-counter-3000-req.log
I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). 
i would still like to have a way to run all 'applicable' tests with 1 
command, even if it takes a hour or so to verify haproxy is working 
'perfectly'. But like abns@ tests cant work on FreeBSD, but they should 
not 'fail', perhaps get skipped automatically though.?. Anyhow thats a 
question for my other mail-topic ( 
https://www.mail-archive.com/haproxy@formilux.org/msg31195.html )

It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
We could use a similar thing to test for select/poll/epoll/kqueue, to
test for timeouts, race conditions (eg show sess in threads). This is
very likely something to brainstorm about. You might have other ideas
related to certain issues you faced in the past. Fred is unavailable
this week but I'd be very interested in his opinion on such things.

Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.
Okay no problem :) , ill keep running this particular test myself for 
the moment, it 'should' be able to pass normally..  (On my environment 
anyhow..)

Thanks,
Willy


Thanks for your comments, and thoughts.

I'm interested in Fred's and anyone elses opinion ;) , and well maybe 
this particular test-case could be replaced by something simpler/faster/ 
with more or less the same likelihood of catching yet unknown issues..? 
Looking forward to reactions :) .


Thanks and regards,

PiBa-NL (Pieter)

 top   0.0 extmacro def pwd=/usr/ports/net/haproxy-devel
 top   0.0 extmacro def localhost=127.0.0.1
 top   0.0 extmacro def bad_backend=127.0.0.1 58530
 top   0.0 extmacro def bad_ip=192.0.2.255
 top   0.0 macro def testdir=/usr/ports/net/haproxy-devel/./test
 top   0.0 macro def tmpdir=/tmp/vtc.35996.290f74a9
*top   0.0 TEST ./test/b0-loadtest.vtc starting
**   top   0.0 === varnishtest "Seamless reload issue with abns sockets"
*top   0.0 TEST Seamless reload issue with abns sockets
**   top   0.0 === feature ignore_unknown_macro
**   top   0.0 === server s1 {
**   s10.0 Starting server
**

[PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

2018-09-14 Thread PiBa-NL

Hi List, Willy,

I've created a regtest that checks that when concurrent connections are 
being handled that the connection counters are kept properly.


I think it could be committed as attached. It takes a few seconds to 
run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on 
FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..).


I think it might be a good and reproducible test to run.?

Or does it need more tweaking? Thoughts appreciated :).

Regards,

PiBa-NL (Pieter)

From 4b1af997e796e1bb2098c5f66ac24690841c72e8 Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Sat, 15 Sep 2018 01:51:54 +0200
Subject: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
 after running 3000 requests in a loop

---
 reg-tests/loadtest/b0-loadtest.vtc | 94 ++
 1 file changed, 94 insertions(+)
 create mode 100644 reg-tests/loadtest/b0-loadtest.vtc

diff --git a/reg-tests/loadtest/b0-loadtest.vtc 
b/reg-tests/loadtest/b0-loadtest.vtc
new file mode 100644
index ..f66df5ee
--- /dev/null
+++ b/reg-tests/loadtest/b0-loadtest.vtc
@@ -0,0 +1,94 @@
+# Checks that request and connection counters are properly kept
+
+varnishtest "Seamless reload issue with abns sockets"
+feature ignore_unknown_macro
+
+server s1 {
+rxreq
+expect req.http.TESTsize == 1000
+txresp
+} -repeat 3 -start
+
+syslog Slg_1 -level notice {
+recv
+} -repeat 15 -start
+
+haproxy h1 -W -D -conf {
+  global
+nbthread 3
+log ${Slg_1_addr}:${Slg_1_port} local0
+maxconn 50
+#nokqueue
+
+  defaults
+mode http
+option dontlog-normal
+log global
+option httplog
+timeout connect 3s
+timeout client  4s
+timeout server  15s
+
+  frontend fe1
+maxconn 20001
+bind "fd@${fe_1}"
+acl donelooping hdr(TEST) -m len 1000
+http-request set-header TEST "%[hdr(TEST)]x"
+use_backend b2 if donelooping
+default_backend b1
+
+  backend b1
+fullconn 2
+server srv1 ${h1_fe_1_addr}:${h1_fe_1_port} maxconn 2
+
+  frontend fe2
+bind "fd@${fe_2}"
+default_backend b2
+
+  backend b2
+# haproxy 1.8 does not have the ,length converter.
+acl OK hdr(TEST) -m len 1000
+http-request deny deny_status 200 if OK
+http-request deny deny_status 400
+
+# haproxy 1.9 does have a ,length converter.
+#http-request set-header TESTsize "%[hdr(TEST),length]"
+#http-request del-header TEST
+#server srv2 ${s1_addr}:${s1_port}
+
+} -start
+
+barrier b1 cond 3
+
+client c1 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c2 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+client c3 -connect ${h1_fe_1_sock} {
+  timeout 17
+   barrier b1 sync
+txreq -url "/"
+rxresp
+expect resp.status == 200
+} -start
+
+client c1 -wait
+client c2 -wait
+client c3 -wait
+
+haproxy h1 -cli {
+send "show info"
+expect ~ "CumConns: 3001"
+send "show info"
+expect ~ "CumReq: 3002"
+}
-- 
2.18.0.windows.1



making a new reg-test to verify server-state-file, why does it fail at random places?

2018-09-13 Thread PiBa-NL

Hi List,

I'm trying to make a reg-test to verify some behavior for 
server-state-file..
But i cant figure out why it fails at random places if ran like 10 times 
in a row, can someone provide a clue? I've tried playing with 'delay 
0.1' settings in between commands.. but that doesn't fix all..


Ive ran it against "HA-Proxy version 1.9-dev1-26e1a8f 2018/09/12"..

varnishtest -l -n 10 -t 5 -k ./seamless-reload/b1-state.vtc

Any of the following error might pop up if ran like 30 times in a row..

 h1    0.4 CLI expect failed ~ "1 srv1 127.0.0.1 2"

 h1    0.4 CLI expect failed ~ "2 srv2 127.0.0.3 0"

 c5    0.4 EXPECT resp.status (503) == "200" failed

 c6    0.5 EXPECT resp.status (200) == "503" failed

Above errors appear randomly.. sometimes 9 out of 10 tests pass 
sometimes less..


 h1    0.9 CLI expect failed ~ " - 33 -"  << The port change not 
being picked up is a 'known bug' afaik.


Am i writing the test in a wrong way? Should i wait/lock/check something 
after some steps and if so how?
Also its taking like a second or more to complete. is there a way to 
make it faster, without loosing predictability?


Regards,

PiBa-NL (Pieter)


# Checks that changes to state are being preserved with statefile unless the 
config changes

varnishtest "Seamless reload issue with abns sockets"
feature ignore_unknown_macro

server s1 {
rxreq
txresp
} -start

server s2 {
rxreq
txresp
} -repeat 23 -start

haproxy h1 -W -conf {
  global
stats socket ${tmpdir}/h1/stats level admin expose-fd listeners
server-state-file ${tmpdir}/h1/hap_state2

  defaults
load-server-state-from-file global
default-server init-addr last,libc,none
mode http
log global
option httplog
timeout connect 150ms
timeout client  2s
timeout server  2s

  listen testme
bind "fd@${testme}"
server test_abns_server /tmp/wpproc1 send-proxy-v2

  frontend test_abns
bind /tmp/wpproc1 accept-proxy
#http-request deny deny_status 200
default_backend test_be

  backend test_be
server srv1 127.0.0.1:8080 weight 151
server srv2 localhost:8081 weight 150 backup
} -start

shell {
  sed -i "" "s,127.0.0.1:8080,${s1_addr}:${s1_port}," ${tmpdir}/h1/cfg
  kill -USR2 $(cat ${tmpdir}/h1/pid)
}
delay 0.1

haproxy h1 -cli {
send "show servers state"
expect ~ "srv1 127.0.0.1 2"

send "show servers state"
expect ~ "srv2 127.0.0.1 2"
}

client c1 -connect ${h1_testme_sock} {
txreq -url "/"
rxresp
expect resp.status == 200
} -run
shell {
  sed -i "" "s,${s1_addr}:${s1_port},${s2_addr}:${s2_port}," ${tmpdir}/h1/cfg
  kill -USR2 $(cat ${tmpdir}/h1/pid)
}
delay 0.1
client c2 -connect ${h1_testme_sock} {
txreq -url "/"
rxresp
expect resp.status == 200
} -run

haproxy h1 -cli {
send "show servers state"
expect ~ "srv1 127.0.0.1 2"

send "show servers state"
expect ~ "srv2 127.0.0.1 2"


send "set server test_be/srv1 state maint"
expect ~ ""
send "set server test_be/srv2 state maint"
expect ~ ""
}

haproxy h1 -cli {

send "show servers state"
expect ~ "srv1 127.0.0.1 0"

send "show servers state"
expect ~ "2 srv2 127.0.0.1 0"
}

client c3 -connect ${h1_testme_sock} {
txreq -url "/"
rxresp
expect resp.status == 503
} -run
haproxy h1 -cli {
send "set server test_be/srv2 addr 127.0.0.2"
expect ~ ""

send "show servers state"
expect ~ "srv2 127.0.0.2"
}
shell {
echo "show servers state" | socat stdio ${tmpdir}/h1/stats > 
${tmpdir}/h1/hap_state2
}
shell {
kill -USR2 $(cat ${tmpdir}/h1/pid)
}
delay 0.1
haproxy h1 -cli {
send "show servers state"
expect ~ "2 srv2 127.0.0.2"

send "set server test_be/srv2 addr 127.0.0.3"
expect ~ ""

send "show servers state"
expect ~ "srv2 127.0.0.3"
}
client c4 -connect ${h1_testme_sock} {
txreq -url "/"
rxresp
expect resp.status == 503
} -run

haproxy h1 -cli {
send "set server test_be/srv1 state ready"
expect ~ ""
send "set server test_be/srv2 state maint"
expect ~ ""

send "show servers state"
expect ~ "1 srv1 127.0.0.1 2"

send "show servers state"
expect ~ "2 srv2 127.0.0.3 0"
}
client c5 -connect ${h1_testme_sock} {
txreq -url "/"
rxresp
expect resp.status == 200
} -repeat 1 -run
haproxy h1 -cli {
send "set server tes

regtest lua/b00002.vtc fails with 1.9-dev2 / master

2018-09-13 Thread PiBa-NL

Hi List, Olivier,

Just tried another run of regtests on FreeBSD, and found that 
lua/b2.vtc fails (coredump, gdb bt below) with todays snapshot: 
HA-Proxy version 1.9-dev2-253006d 2018/09/12


While on yesterday's snapshot it works properly, below some tests on 
different commits, and while the 'log' and 'connection' seem to have 
been fixed, the 'lua' check fails also with todays latest commit.


MINOR: h1: parse the Connection header field        98f5cf7 
/lua/b2.vtc FAILED


MINOR: conn_streams: Remove wait_list from conn_streams. 7138455 
/lua/b2.vtc FAILED (0.029) exit=2  h1    0.0 Bad exit status: 
0x008b exit 0x0 signal 11 core 128
MINOR: checks: Give checks their own wait_list. 26e1a8f /lua/b2.vtc 
FAILED
MEDIUM: stream_interfaces: Starts receiving from the... f653528 
/log/b0.vtc FAILED
MEDIUM: mux_h2: Revamp the send path when blocking. 8ae735d 
/log/b0.vtc FAILED  /connection/b0.vtc TIMED OUT (kill -9)
MINOR: connections: Add a "handle" field to wait_list. cb1f49f 
log/b0.vtc FAILED

MEDIUM: stream_interface: Make recv() subscribe when...
MEDIUM: h2: Don't use a wake() method anymore. 7505f94 log/b0.vtc FAILED
MEDIUM: h2: always subscribe to receive if allowed. a1411e 
log/b0.vtc FAILED
MINOR: h2: Let user of h2_recv() and h2_send() know... d4dd22 
log/b0.vtc FAILED
MEDIUM: connections: Get rid of the recv() method. af4021 log/b0.vtc 
FAILED,  s1    2.0 HTTP rx failed (fd:5 read: Connection reset by peer)

MEDIUM: connections/mux: Add a recv and a send+recv... 4cf7fb OK

Did something go astray?

p.s. should reg-tests maybe get run before the -dev releases get tagged? 
(or is that already done, and this could be a FreeBSD specific issue and 
as such not have been spotted?)


Regards,

PiBa-NL (Pieter)

#0  si_cs_recv (cs=0x802c71180) at src/stream_interface.c:1163
1163    if (conn->xprt->rcv_pipe && conn->mux->rcv_pipe &&
(gdb) bt full
#0  si_cs_recv (cs=0x802c71180) at src/stream_interface.c:1163
    conn = (struct connection *) 0x802c63700
    si = (struct stream_interface *) 0x802c74848
    ic = (struct channel *) 0x802c74570
    ret = 0
    max = 0
    cur_read = 0
    read_poll = 4
#1  0x0054e11f in si_cs_io_cb (t=0x0, ctx=0x802c74848, state=0) 
at src/stream_interface.c:741

    si = (struct stream_interface *) 0x802c74848
    cs = (struct conn_stream *) 0x802c71180
    ret = 0
#2  0x004b717d in process_stream (t=0x802c79280, 
context=0x802c74500, state=257) at src/stream.c:1660

    srv = (struct server *) 0x5a5c8b
    s = (struct stream *) 0x802c74500
    sess = (struct session *) 0x802c6a1c0
    rqf_last = 0
    rpf_last = 4500720
    rq_prod_last = 8
    rq_cons_last = 46544272
    rp_cons_last = 0
    rp_prod_last = 0
    req_ana_back = 0
    req = (struct channel *) 0x802c74510
    res = (struct channel *) 0x802c74570
    si_f = (struct stream_interface *) 0x802c747f8
    si_b = (struct stream_interface *) 0x802c74848
#3  0x005a5629 in process_runnable_tasks () at src/task.c:381
    t = (struct task *) 0x802c79280
    state = 257
    ctx = (void *) 0x802c74500
    process = (struct task *(*)(struct task *, void *, unsigned 
short)) 0x4b70b0 

    t = (struct task *) 0x802c79280
---Type  to continue, or q  to quit---
    max_processed = 199
#4  0x00518d02 in run_poll_loop () at src/haproxy.c:2511
    next = 0
    exp = -750267441
#5  0x00515930 in run_thread_poll_loop (data=0x8024873d4) at 
src/haproxy.c:2576
    start_lock = {lock = 0, info = {owner = 0, waiters = 0, 
last_location = {function = 0x0, file = 0x0, line = 0}}}

    ptif = (struct per_thread_init_fct *) 0x8c7838
    ptdf = (struct per_thread_deinit_fct *) 0x800f1d7cc
#6  0x000800f18bc5 in pthread_create () from /lib/libthr.so.3
No symbol table info available.
#7  0x in ?? ()
No symbol table info available.
Current language:  auto; currently minimal
(gdb) info thread
  3 process 101654  0x000801e11e3a in _kevent () from /lib/libc.so.7
  2 process 101287  0x000801e11e3a in _kevent () from /lib/libc.so.7
* 1 process 101624  si_cs_recv (cs=0x802c71180) at 
src/stream_interface.c:1163





reg-test failures on FreeBSD, how to best adapt/skip some tests?

2018-09-11 Thread PiBa-NL

Hi List,

I was wondering how to best run the reg-tests that are 'valid' for FreeBSD.

There are a 2 tests that use abns@ sockets, which seem not available on 
FreeBSD.
Also 1 test is failing for a reason i'm not totally sure if its totally 
expected to or not..


- /connection/b0.vtc
probably does not 'really' need abns@ sockets, so changing to unix@ 
would make it testable on more platforms?


- /log/b0.vtc
Not exactly sure why this fails/why it was supposed to work.
It either produces a timeout, or the s1 server fails to read the request 
which the tcp-healthcheck does not send..


    ***  s1    0.0 accepted fd 5 127.0.0.1 23986
    **   s1    0.0 === rxreq
     s1    0.0 HTTP rx failed (fd:5 read: Connection reset by peer)

- /seamless-reload/b0.vtc
This one specifically mentions testing a abns@ socket functionality. so 
changing it to a unix@ socket likely changes the test in such a way its 
no longer testing what it was meant for..

What would be the best way to skip this test on FreeBSD?

With a few small changes (attached) i can run all tests like this, get 
the following result:


varnishtest -l ./work/haproxy-ss-20180901/reg-tests/*/*.vtc
#    top  TEST 
./work/haproxy-ss-20180901/reg-tests/connection/b0.vtc passed (0.142)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/log/b0.vtc 
passed (0.136)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/lua/b0.vtc 
passed (0.121)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/lua/b1.vtc 
passed (0.133)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/lua/b2.vtc 
passed (0.186)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/lua/b3.vtc 
passed (0.133)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/lua/h1.vtc 
passed (0.120)
#    top  TEST 
./work/haproxy-ss-20180901/reg-tests/seamless-reload/b0.vtc passed 
(0.143)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/spoe/b0.vtc 
passed (0.011)
#    top  TEST ./work/haproxy-ss-20180901/reg-tests/ssl/b0.vtc 
passed (0.146)
#    top  TEST 
./work/haproxy-ss-20180901/reg-tests/stick-table/b0.vtc passed (0.120)
#    top  TEST 
./work/haproxy-ss-20180901/reg-tests/stick-table/b1.vtc passed (0.122)


Above would be good :).. But well.. needs changing or skipping some tests..

I would like to know next time that 'all' testable reg-tests are working 
properly when making a new build to take into my production system. If 
some tests fail by design (on this platform), it takes more 
administration to figure out if that was okay or not.


Please advice :)

Regards,

PiBa-NL (Pieter)

From 89a5fd48c11dba6a30f468e86e5a7d4bdab6b986 Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Tue, 11 Sep 2018 16:19:21 +0200
Subject: [PATCH] fix freebsd reg-tests it has no abns@ socket ability.. also
 the log/b0.vtc test fails on the health-check either timing out, or
 failing to read the request from the client when its only perfoming a tcp
 connection check.

---
 reg-tests/connection/b0.vtc  | 4 ++--
 reg-tests/log/b0.vtc | 3 ++-
 reg-tests/seamless-reload/b0.vtc | 4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/reg-tests/connection/b0.vtc b/reg-tests/connection/b0.vtc
index 3a873848..a2b74f73 100644
--- a/reg-tests/connection/b0.vtc
+++ b/reg-tests/connection/b0.vtc
@@ -36,14 +36,14 @@ haproxy h1 -conf {
 
 listen http
 bind-process 1
-bind abns@http accept-proxy name ssl-offload-http
+bind /tmp/http accept-proxy name ssl-offload-http
 option forwardfor
 
 listen ssl-offload-http
 option httplog
 bind-process 2-4
 bind "fd@${ssl}" ssl crt ${testdir}/common.pem ssl no-sslv3 alpn 
h2,http/1.1
-server http abns@http send-proxy
+server http /tmp/http send-proxy
 } -start
 
 
diff --git a/reg-tests/log/b0.vtc b/reg-tests/log/b0.vtc
index f0ab7ea1..423c7e7f 100644
--- a/reg-tests/log/b0.vtc
+++ b/reg-tests/log/b0.vtc
@@ -24,7 +24,7 @@ feature ignore_unknown_macro
 server s1 {
 rxreq
 txresp
-} -start
+} -repeat 2 -start
 
 syslog Slg_1 -level notice {
 recv
@@ -50,6 +50,7 @@ frontend fe1
 default_backendbe_app
 
 backend be_app
+option httpchk
 server app1 ${s1_addr}:${s1_port} check
 } -start
 
diff --git a/reg-tests/seamless-reload/b0.vtc 
b/reg-tests/seamless-reload/b0.vtc
index 498e0c61..e8507523 100644
--- a/reg-tests/seamless-reload/b0.vtc
+++ b/reg-tests/seamless-reload/b0.vtc
@@ -25,10 +25,10 @@ haproxy h1 -W -conf {
 
   listen testme
 bind "fd@${testme}"
-server test_abns_server abns@wpproc1 send-proxy-v2
+server test_abns_server /tmp/wpproc1 send-proxy-v2
 
   frontend test_abns
-bind abns@wpproc1 accept-proxy
+bind /tmp/wpproc1 accept-proxy
 http-request deny deny_status 200
 } -start
 
-- 
2.18.0.windows.1



Re: [PATCH] BUG/MAJOR: thread: lua: Wrong SSL context initialization.

2018-08-30 Thread PiBa-NL

Op 30-8-2018 om 10:07 schreef Willy Tarreau:

On Thu, Aug 30, 2018 at 10:03:32AM +0200, Thierry Fournier wrote:

Hi Pieter,

Your patch makes sense !
Good catch.
Willy, could you apply ?

OK now applied, thank you guys!

Willy


Willy, thanks.

Thierry, for the record, its not my patch, Frederic made the patch. Only 
thing i provided was a 'reg-test' and some gdb traces..


Anyhow its fixed now, that's the important part ;)

Regards,
PiBa-NL (Pieter)




Re: [PATCH] BUG/MAJOR: thread: lua: Wrong SSL context initialization.

2018-08-29 Thread PiBa-NL

Op 29-8-2018 om 14:29 schreef Olivier Houchard:

On Wed, Aug 29, 2018 at 02:11:45PM +0200, Frederic Lecaille wrote:

This patch is in relation with one bug reproduced by the reg testing file
sent by Pieter in this thread:
https://www.mail-archive.com/haproxy@formilux.org/msg31079.html

Must be checked by Thierry.
Must be backported to 1.8.

Note that Pieter reg testing files reg-tests/lua/b2.* come with this
patch.


Fred.
 From d6d38a354a89b55f91bb9962c5832a089d960b60 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20L=C3=A9caille?= 
Date: Wed, 29 Aug 2018 13:46:24 +0200
Subject: [PATCH] BUG/MAJOR: thread: lua: Wrong SSL context initialization.

When calling ->prepare_srv() callback for SSL server which
depends on global "nbthread" value, this latter was not already parsed,
so equal to 1 default value. This lead to bad memory accesses.

Thank you to Pieter (PiBa-NL) for having reported this issue and
for having provided a very helpful reg testing file to reproduce
this issue (reg-test/lua/b2.*).


That sounds good, nice catch !

And yes thanks Pieter, as usual :)

Olivier


As you've probably already verified, the issue is indeed fixed with this 
patch applied on top of master.


Thanks Frederic & Olivier.

@Thierry, can you give the 'all okay' ? (or not okay, if it needs a 
different fix..)


Regards,
PiBa-NL (Pieter)



Re: lua script, 200% cpu usage with nbthread 3 - haproxy hangs - __spin_lock - HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

2018-08-28 Thread PiBa-NL

Hi Frederic,

Op 28-8-2018 om 11:27 schreef Frederic Lecaille:

On 08/27/2018 10:46 PM, PiBa-NL wrote:

Hi Frederic, Oliver,

Thanks for your investigations :).
I've made a little reg-test (files attached). Its probably not 
'correct' to commit as-is, but should be enough to get a 
reproduction.. I hope..


changing it to nbthread 1 makes it work every time..(that i tried)

The test actually seems to show a variety of issues.
## Every once in a while it takes like 7 seconds to run a test.. 
During which cpu usage is high..


do you think we can reproduce this 200% CPU usage issue after having 
disabled ssl


With ssl 'disabled' i can run the test 500 times without a single failure..

As for the cpu usage issue it does not seem to reproduce 'easily' when 
running inside varnishtest.. But that might also be because it dumps its 
core most of the time..
Using the same config that varnishtest generated, and then changing the 
ports to :80 (for frontend) and 81 (for stats) then manually running 
haproxy -f /tmp/vtc.132.456/h1/cfg after a few curl requests curl hangs 
waiting for haproxy's response which is running 100% cpu..


Below 2 backtraces one of 100% cpu usage, and one of a core dump. Does 
that help? Do you need the actual core+binary?


Regards,
PiBa-NL (Pieter)

#
Using 100% cpu:

(gdb) info thread
  Id   Target Id Frame
* 1    LWP 101573 of process 28901 0x000801e11e3a in _kevent () from 
/lib/libc.so.7
  2    LWP 100816 of process 28901 0x000801e11e3a in _kevent () 
from /lib/libc.so.7
  3    LWP 101309 of process 28901 0x00080187a71d in ?? () from 
/usr/local/lib/liblua-5.3.so

(gdb) thread 3
[Switching to thread 3 (LWP 101309 of process 28901)]
#0  0x00080187a71d in ?? () from /usr/local/lib/liblua-5.3.so
(gdb) bt full
#0  0x00080187a71d in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#1  0x00080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#2  0x00080187b108 in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#3  0x000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#4  0x00438e45 in hlua_ctx_resume (lua=0x8024dbf80, 
yield_allowed=1) at src/hlua.c:1186

    ret = 0
    msg = 0x5a5306  "Hiu\360"
    trace = 0x7fffdfdfcc00 ""
#5  0x0044887a in hlua_applet_http_fct (ctx=0x8024d4a80) at 
src/hlua.c:6716

    si = 0x803081840
    strm = 0x803081500
    res = 0x803081570
    rule = 0x80242d6e0
    px = 0x8024c4400
    hlua = 0x8024dbf80
    blk1 = 0x7fffdfdfcca0 ""
    len1 = 34397581057
    blk2 = 0x803081578 ""
    len2 = 34410599800
---Type  to continue, or q  to quit---
    ret = 0
#6  0x005a78a7 in task_run_applet (t=0x80242db40, 
context=0x8024d4a80, state=16385) at src/applet.c:49

    app = 0x8024d4a80
    si = 0x803081840
#7  0x005a49a6 in process_runnable_tasks () at src/task.c:384
    t = 0x80242db40
    state = 16385
    ctx = 0x8024d4a80
    process = 0x5a77f0 
    t = 0x80242db40
    max_processed = 200
#8  0x0051a6b2 in run_poll_loop () at src/haproxy.c:2386
    next = -2118609833
    exp = -2118610700
#9  0x00517672 in run_thread_poll_loop (data=0x8024843c8) at 
src/haproxy.c:2451
    start_lock = {lock = 0, info = {owner = 0, waiters = 0, 
last_location = {function = 0x0, file = 0x0, line = 0}}}

    ptif = 0x8c1980 
    ptdf = 0x800f177cc
#10 0x000800f12bc5 in ?? () from /lib/libthr.so.3
No symbol table info available.
#11 0x in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000


##
Core dump:

gdb --core haproxy.core /usr/local/sbin/haproxy
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Core was generated by `haproxy -f /tmp/vtc.28884.6c5c88f3/h1/cfg'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libcrypt.so.5...done.
Loaded symbols for /lib/libcrypt.so.5
Reading symbols from /lib/libz.so.6...done.
Loaded symbols for /lib/libz.so.6
Reading symbols from /lib/libthr.so.3...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /usr/lib/libssl.so.8...done.
Loaded symbols for /usr/lib/libssl.so.8
Reading symbols from /lib/libcrypto.so.8...done.
Loaded symbols for /lib/libcrypto.so.8
Reading symbols from /usr/local/lib/liblua-5.3.so...done.
Loaded symbols for /usr/local/lib/liblua-5.3.so
Reading symbols from /lib/li

Re: lua script, 200% cpu usage with nbthread 3 - haproxy hangs - __spin_lock - HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

2018-08-27 Thread PiBa-NL

Hi Frederic, Oliver,

Thanks for your investigations :).
I've made a little reg-test (files attached). Its probably not 'correct' 
to commit as-is, but should be enough to get a reproduction.. I hope..


changing it to nbthread 1 makes it work every time..(that i tried)

The test actually seems to show a variety of issues.
## Every once in a while it takes like 7 seconds to run a test.. During 
which cpu usage is high..


     c0    7.6 HTTP rx timeout (fd:5 7500 ms)

## But most of the time, it just doesn't finish with a correct result 
(ive seen haproxy do core dumps also while testing..). There is of 
course the option that i did something wrong in the lua as well...


Does the test itself work for you guys? (with nbthread 1)

Did i do something crazy in the lua code? , i do have several loops.. 
but i don't think thats where it 'hangs' ?..


Regards,

PiBa-NL (Pieter)

Luacurl = {}
Luacurl.__index = Luacurl
setmetatable(Luacurl, {
__call = function (cls, ...)
return cls.new(...)
end,
})
function Luacurl.new(server, port, ssl)
local self = setmetatable({}, Luacurl)
self.sockconnected = false
self.server = server
self.port = port
self.ssl = ssl
self.cookies = {}
return self
end

function Luacurl:get(method,url,headers,data)
core.Info("MAKING SOCKET")
if self.sockconnected == false then
  self.sock = core.tcp()
  if self.ssl then
local r = self.sock:connect_ssl(self.server,self.port)
  else
local r = self.sock:connect(self.server,self.port)
  end
  self.sockconnected = true
end
core.Info("SOCKET MADE")
local request = method.." "..url.." HTTP/1.1"
if data ~= nil then
request = request .. "\r\nContent-Length: "..string.len(data)
end
if headers ~= null then
for h,v in pairs(headers) do
request = request .. "\r\n"..h..": "..v
end
end
cookstring = ""
for cook,cookval in pairs(self.cookies) do
cookstring = cookstring .. cook.."="..cookval.."; "
end
if string.len(cookstring) > 0 then
request = request .. "\r\nCookie: "..cookstring
end

request = request .. "\r\n\r\n"
if data and string.len(data) > 0 then
request = request .. data
end
--print(request)
core.Info("SENDING REQUEST")
self.sock:send(request)

--  core.Info("PROCESSING RESPONSE")
return processhttpresponse(self.sock)
end

function processhttpresponse(socket)
local res = {}
core.Info("1")
res.status = socket:receive("*l")
core.Info("2")

if res.status == nil then
core.Info(" processhttpresponse RECEIVING status: NIL")
return res
end
core.Info(" processhttpresponse RECEIVING status:"..res.status)
res.headers = {}
res.headerslist = {}
repeat
core.Info("3")
local header = socket:receive("*l")
if header == nil then
return "error"
end
local valuestart = header:find(":")
if valuestart ~= nil then
local head = header:sub(1,valuestart-1)
local value = header:sub(valuestart+2)
table.insert(res.headerslist, {head,value})
res.headers[head] = value
end
until header == ""
local bodydone = false
if res.headers["Connection"] ~= nil and res.headers["Connection"] == 
"close" then
--  core.Info("luacurl processresponse with connection:close")
res.body = ""
repeat
core.Info("4")
local d = socket:receive("*a")
if d ~= nil then
res.body = res.body .. d
end
until d == nil or d == 0
bodydone = true
end
if bodydone == false and res.headers["Content-Length"] ~= nil then
res.contentlength = tonumber(res.headers["Content-Length"])
if res.contentlength == nil then
  core.Warning("res.contentlength ~NIL = 
"..res.headers["Content-Length"])
end
--  core.Info("luacur, contentlength="..res.contentlength)
res.body = ""
repeat
   

lua script, 200% cpu usage with nbthread 3 - haproxy hangs - __spin_lock - HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

2018-08-25 Thread PiBa-NL

Hi List, Thierry, Olivier,

Using a lua-socket with connect_ssl and haproxy running with nbthread 
3.. results in haproxy hanging with 3 threads for me.


This while using both 1.9-7/30 version (with the 2 extra patches from 
Olivier avoiding 100% on a single thread.) and also a build of today's 
snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25


Below info is at the bottom of the mail:
- haproxy -vv
- gdb backtraces

This one is easy to reproduce after just a few calls to the lua function 
with the lua code i'm writing on a test-box.. So if a 'simple' config 
that makes a reproduction is desired i can likely come up with one.

Same lua code with nbthread 1 seems to work properly.

Is below info (the stack traces) enough to come up with a fix? If not 
lemme know and ill try and make a small reproduction of it.



root@freebsd11:~ # haproxy -vv
HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector 
-fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement 
-fwrapv -fno-strict-overflow -Wno-address-of-packed-member 
-Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols markes as  cannot be specified using 'proto' keyword)
    : mode=TCP|HTTP   side=FE|BE
  h2 : mode=HTTP   side=FE

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe

root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 39649
Reading symbols from /usr/local/sbin/haproxy...done.
[New LWP 101651 of process 39649]
[New LWP 101652 of process 39649]
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols 
found)...done.

Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols 
found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols 
found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging 
symbols found)...done.

Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols 
found)...done.

[Switching to LWP 101650 of process 39649]
0x000801e11e3a in _kevent () from /lib/libc.so.7
(gdb) info thread
  Id   Target Id Frame
* 1    LWP 101650 of process 39649 0x000801e11e3a in _kevent () from 
/lib/libc.so.7
  2    LWP 101651 of process 39649 0x00437b92 in __spin_lock 
(lbl=LUA_LOCK, l=0x8cf1d8 , func=0x62a781 
"hlua_ctx_resume",
    file=0x62a328 "src/hlua.c", line=1070) at 
include/common/hathreads.h:731
  3    LWP 101652 of process 39649 0x00080187a70c in ?? () from 
/usr/local/lib/liblua-5.3.so

(gdb) bt full

Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-20 Thread PiBa-NL

Hi Olivier,

Op 17-8-2018 om 14:51 schreef Willy Tarreau:

On Fri, Aug 17, 2018 at 01:41:54PM +0200, Olivier Houchard wrote:

That is true, this one is not a bug, but a pessimization, by using the global
update_list which is more costly than the local one.

Patches attached to do as suggest.

Applied, thank you!
willy


Just a little update :)

The '1.9-dev0-48d92ee 2018/07/30'+your initial 2 patches, is running 
correctly using as little cpu as can be expected for its little workload 
for 4+ days now. I think i can call it 'fix confirmed', as you already 
knew ;) , previously the issue would have likely returned in this time 
period..


Ill keep it running for a few more days, and switch back to nbthread 3 
then.. Till next time ;)


Thanks again!

Best regards,

PiBa-NL (Pieter)




Re: 100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-15 Thread PiBa-NL

Hi List,

Anyone got a idea how to debug this further?
Currently its running at 100% again, any pointers to debug the process 
as its running would be appreciated.


Or should i compile again from current master and 'hope' it doesn't return?

b.t.w. truss output is as follows:
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 0.0 })  = 0 (0x0)

Regards,
PiBa-NL (Pieter)

Op 8-8-2018 om 22:49 schreef PiBa-NL:

Hi List,

Ive got a weird issue.. and im not sure where/how to continue digging 
at the moment...


Using nbthread=1 nbproc=1, a few lua scripts, ssl offloading / http 
traffic.. Only a few connections < 100...


Sometimes haproxy starts using 100% cpu usage.. After a few days.. 
(Makes it hard to debug..)
Currently running with version 'HA-Proxy version 1.9-dev0-48d92ee 
2018/07/30'
Ive ran some commands against the haproxy socket like 'show activity', 
as can be seen there are lots of loops and tasks in just a second of 
time.


[2.4.3-RELEASE][root@pfsense_3]/root: 
/usr/local/pkg/haproxy/haproxy_socket.sh show activity

show activity thread_id: 0
date_now: 1533754729.799405
loops: 828928664
wake_cache: 845396
wake_tasks: 827400248
wake_signal: 0
poll_exp: 828245644
poll_drop: 17451
poll_dead: 0
poll_skip: 0
fd_skip: 0
fd_lock: 0
fd_del: 0
conn_dead: 0
stream: 101147
empty_rq: 1242050
long_rq: 0

[2.4.3-RELEASE][root@pfsense_3]/root: 
/usr/local/pkg/haproxy/haproxy_socket.sh show activity

show activity thread_id: 0
date_now: 1533754731.084664
loops: 829000230
wake_cache: 845398
wake_tasks: 827471812
wake_signal: 0
poll_exp: 828317210
poll_drop: 17452
poll_dead: 0
poll_skip: 0
fd_skip: 0
fd_lock: 0
fd_del: 0
conn_dead: 0
stream: 101149
empty_rq: 1242050
long_rq: 0

Other than that ive tried to attach gdb and step through / log some 
functions.. it passes through.


With a gdb 'command' file like bellow i created a little log of 
function breakpoints hit:

set pagination off
set height 0
set logging on
delete
rbreak haproxy.c:.
rbreak session.c:.
rbreak hlua.c:.
rbreak task.c:.
commands 1-9
cont
end
cont

Which got me a log with the following content.. as you can see it 
'seems' to be looping over the same task multiple times.., which might 
not even be a problem.??. the t=0x802545a60 expires and wakes up , 
expires and wakes up.?.:


Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haproxy.c: No such file or directory.

Breakpoint 252, process_runnable_tasks () at src/task.c:275
275    src/task.c: No such file or directory.

Breakpoint 51, session_expire_embryonic (t=0x802545a60, 
context=0x8024483a0, state=513) at src/session.c:389

389    src/session.c: No such file or directory.

Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haproxy.c: No such file or directory.

Breakpoint 252, process_runnable_tasks () at src/task.c:275
275    src/task.c: No such file or directory.

Breakpoint 51, session_expire_embryonic (t=0x802545a60, 
context=0x8024483a0, state=513) at src/session.c:389

389    src/session.c: No such file or directory.

Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    src/haproxy.c: No such file or directory.

Breakpoint 252, process_runnable_tasks () at src/task.c:275
275    src/task.c: No such file or directory.

Breakpoint 51, session_expire_embryonic (t=0x802545a60, 
context=0x8024483a0, state=513) at src/session.c:389

389    src/session.c: No such file or directory.

Breakpoint 249, __task_queue (task=0x802545a60) at src/task.c:185
185    src/task.c: No such file or directory.

Breakpoint 253, wake_expired_tasks () at src/task.c:209
209    in src/task.c

Breakpoint 250, __task_wakeup (t=0x802545a60, root=0x8ced50 
) at src/task.c:72

72    in src/task.c

Breakpoint 41, sync_poll_loop () at src/haproxy.c:2378
2378    

100% cpu usage 1.9-dev0-48d92ee 2018/07/30, task.?. but keeps working.. (nbthread 1)

2018-08-08 Thread PiBa-NL
k.c:275
275    src/task.c: No such file or directory.


haproxy -vv
HA-Proxy version 1.9-dev0-48d92ee 2018/07/30
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector 
-fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement 
-fwrapv -fno-strict-overflow -Wno-address-of-packed-member 
-Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe

But im not sure what to do next.. Today it happened again and i tried to 
run the gdb log command with all haproxy source files.. but then it 
stopped working completely..(or at least so slow it didnt properly 
respond anymore..) so i had to abort and restart..


Is there any extra info i can gather next time? A different gdb command 
script to try and run?


Should i try a newer version? (i did have it with a previous build from 
a week earlier also.., not sure if it happened before that..)


Could it be because of the openssl version mismatch? (not sure how easy 
it is for me to compile it against the 'correct' version.. i never 
seemed to have issues with that before..)


Hoping someone has an idea how to debug it further/differently or 
perhaps create a patch that might provide extra information when it 
occurs again.?.


Regards,

PiBa-NL (Pieter)




Re: haproxy 1.8.12 / 1.9- 20180623 / stopping process hangs with threads (100% cpu) on -sf reload / FreeBSD

2018-07-20 Thread PiBa-NL

Thanks Christopher & Willy,

Op 20-7-2018 om 14:26 schreef Willy Tarreau:

Op 20-7-2018 om 10:43 schreef Christopher Faulet:

OK finally I've merged it because it obviously fixes a bug.
Willy


Confirmed fixed with current master's:
HA-Proxy version 1.8.12-5e100b4 2018/07/20
HA-Proxy version 1.9-dev0-842ed9b 2018/07/20

(Well at least my reproduction doesn't work anymore.. while it did quite 
easily before.) So thats good ;)


Regards,
PiBa-NL (Pieter)




Re: haproxy 1.8.12 / 1.9- 20180623 / stopping process hangs with threads (100% cpu) on -sf reload / FreeBSD

2018-07-17 Thread PiBa-NL

Hi Christopher,

Op 17-7-2018 om 10:09 schreef Christopher Faulet:

Could you try to revert the following commit please ?

 * ba86c6c25 MINOR: threads: Be sure to remove threads from 
all_threads_mask on exit


Without this specific commit the termination of the old process works 
'properly'.
That is.. for testing i used 1.9 snapshot of 20180714 and included a 
little patch to remove the 'atomic and'.. which is basically what that 
commit added..

 #ifdef USE_THREAD
-    HA_ATOMIC_AND(_threads_mask, ~tid_bit);
 if (tid > 0)
     pthread_exit(NULL);
 #endif

Also snapshot of 20180622 + 
'0461-BUG-MEDIUM-threads-Use-the-sync-point-to-che-1.9-dev0.patch' works 
okay.


Though i guess just reverting that line is not the right fix ;).

Regards,
PiBa-NL (Pieter)



haproxy 1.8.12 / 1.9- 20180623 / stopping process hangs with threads (100% cpu) on -sf reload / FreeBSD

2018-07-16 Thread PiBa-NL

Hi List,

With a build of 1.8.12 (and the 1.9 snapshot of 20180623 ) im getting 
the 'old' haproxy process take up 100% cpu usage when using 3 threads in 
the config and reloading with -sf parameter. I'm using FreeBSD.. (It 
also happens with the 14-7 snapshot.)


It seems to happen after 1 thread quits, one of the others gets out of 
control.

Most of the time it happens after the first reload.:
haproxy -f /var/etc/haproxy/haproxy.cfg -D
haproxy -f /var/etc/haproxy/haproxy.cfg -D -sf 19110

The main features i use are:  ssl offloading / lua / threads
Only a little to no traffic passing through though, im seeing this 
behavior also on my in-active production node, the strange part sofar 
though is that i could not reproduce it yet on my test machine.
If someone has got a idea on how to patch or what direction to search 
for a fix i'm happy to try.


If there is nothing obvious that can be spotted with the info from the 
stack-traces of both 1.8 and 1.9 below ill try and dig further tomorrow 
:).Thanks in advance for anyone's time :).


Regards,
PiBa-NL (Pieter)

p.s.
I CC'ed Christopher as he seems to have made the last 2 patches going 
into 20180623. Im hoping he has a clue on what to do next :).


## Below stack traces are from 1.8.12.. ##

[2.4.3-RELEASE][admin@pfsense_3]/root: /usr/local/bin/gdb --pid 2136 
/usr/local/sbin/haproxy

GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/haproxy...done.
Attaching to program: /usr/local/sbin/haproxy, process 2136
[New LWP 101099 of process 2136]
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols 
found)...done.

Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols 
found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols 
found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging 
symbols found)...done.

Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols 
found)...done.

[Switching to LWP 100345 of process 2136]
0x000800f0f91c in ?? () from /lib/libthr.so.3
(gdb) thread info
Invalid thread ID: info
(gdb) info thread
  Id   Target Id Frame
* 1    LWP 100345 of process 2136 0x000800f0f91c in ?? () from 
/lib/libthr.so.3
  2    LWP 101099 of process 2136 thread_sync_barrier (barrier=0x8bc4e0 
) at src/hathreads.c:112

(gdb) bt full
#0  0x000800f0f91c in ?? () from /lib/libthr.so.3
No symbol table info available.
#1  0x000800f0bf97 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x0050bd4f in main (argc=6, argv=0x7fffec70) at 
src/haproxy.c:3077

    tids = 0x80249cf40
    threads = 0x802487240
    i = 2
    old_sig = {__bits = {1073741824, 0, 0, 0}}
    blocked_sig = {__bits = {4227856759, 4294967295, 4294967295, 
4294967295}}

    err = 0
    retry = 200
    limit = {rlim_cur = 2282, rlim_max = 2282}
    errmsg = 
"\000\354\377\377\377\177\000\000\250\354\377\377\377\177\000\000p\354\377\377\377\177\000\000\006\000\000\000\000\000\000\000\270\030\241\374+/\213\032`e\213\000\000\000\000\000h\354\377\377\377\177\000\000\250\354\377\377\377\177\000\000p\354\377\377\377\177\000\000\006\000\000\000\000\000\000\000\020\354\377\377\377\177\000\000r3\340\001\b\000\000\000\001\000\000"

    pidfd = 27
(gdb) thread 2
[Switching to thread 2 (LWP 101099 of process 2136)]
#0  thread_sync_barrier (barrier=0x8bc4e0 ) at 
src/hathreads.c:112

112 src/hathreads.c: No such file or directory.
(gdb) bt full
#0  thread_sync_barrier (barrier=0x8bc4e0 ) at 
src/hathreads.c:112

    old = 7
#1  0x005ae23f in thread_exit_sync () at src/hathreads.c:151
    barrier = 7
#2  0x0051260d in sync_poll_loop () at src/haproxy.c:2391
    stop = 1
#3  0x00512533 in run_poll_loop () at src/haproxy.c:2438
    next = -1636293614
    exp = -1636293614
#4  0x000

Re: dev1.9 2018/06/05 threads cpu 100% spin_lock v.s. thread_sync_barrier

2018-06-12 Thread PiBa-NL

Hi Willy,

Op 12-6-2018 om 14:31 schreef Willy Tarreau:

This one is not known yet, to the best of my knowledge, or at least
not reported yet.

Okay :) I guess ill keep an eye on if it happens again.

Is there something i can do to find out more info if it happens again? 
Or maybe before that build with more specific debug info so if it 
happens again more info would be retrievable.?.
(My usual way of 'debugging' with the relatively easy to reproduce 
issues is just cramming a lot of printf statements into the code until 
something makes sense., but with a issue that only happens once in a 
blue moon (sofar), that doesn't work very well...)


For the moment it hasn't happened again. i suspend/resume the vm it 
happened on almost daily (that's running on my workstation witch is 
shutdown overnight) the vm also has haproxy running on it and some other 
stuff..


If it does happen again, would there be any other gdb information 
helpful? Inspecting specific variables or creating a memory dump or 
something.?  (please give a hint about the comment/option to call if 
applicable/possible, i'm still a rookie with gdb..) p.s. i'm on FreeBSD 
not sure if that matters for some of gdb's available options or something..


Would the complete configuration be helpfull? There is a lot of useless 
stuff in there. because its my test/development test vm, and because of 
lack of it happening again there is no good opportunity to shrink it 
down to specific options/parts..


Thanks for your analysis, though i'm not sure what to do with the info 
at the moment, i do hope you guys together can find a likely culprit 
from the information given ;).


Regards,
PiBa-NL (Pieter)



dev1.9 2018/06/05 threads cpu 100% spin_lock v.s. thread_sync_barrier

2018-06-11 Thread PiBa-NL

Hi List,

I've got no clue how i got into this state ;) and maybe there is nothing 
wrong..(well i did resume a VM that was suspended for half a day..)


Still thought it might be worth reporting, or perhaps its solved already 
as there are a few fixes for threads after the 6-6 snapshot that i build 
with..
Sometimes all that some people need is half a idea to find a problem... 
So maybe there is something that needs fixing??


Haproxy running with 3 threads at 300% cpu usage, .. some lua, almost no 
traffic, in a vm that just resumed operation and is still going through 
its passes to initialize its nic's and some stuff that noticed the clock 
jumped on its back and its dhcp lease expired or something like that.. 
anyhow lots of things going on at that moment..


Below some of the details ive got about the threads, one is spinning, 
the others seemingly waiting for spin_locks.?.


Like i wrote, not sure if its 'something' and i don't know yet if i can 
reproduce it a second time.
If more info is needed, please let me know and ill try and provide it. 
But at the moment its a 1 time occurrence, i think..

If there is nothing obvious wrong, for now maybe just ignore this mail.
Also ill update to latest snapshot 2018/06/08. Maybe i wont see it ever 
again :).


haproxy -vv
HA-Proxy version 1.9-dev0-cc0a957 2018/06/05
Copyright 2000-2017 Willy Tarreau 

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -pipe -g -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-fno-strict-overflow -Wno-address-of-packed-member -Wno-null-dereference 
-Wno-unused-label -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2o-freebsd  27 Mar 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe



(gdb) info threads
  Id   Target Id Frame
* 1    LWP 100660 of process 56253 0x005b0202 in 
thread_sync_barrier (barrier=0x8bc690 ) at 
src/hathreads.c:109
  2    LWP 101036 of process 56253 0x0050874a in 
process_chk_conn (t=0x8025187c0, context=0x802482610, state=33) at 
src/checks.c:2112
  3    LWP 101037 of process 56253 0x0050b58e in 
enqueue_one_email_alert (p=0x80253f400, s=0x8024dec00, q=0x802482600,
    msg=0x7fffdfdfc770 "Health check for server 
Test-SNI_ipvANY/srv451-4 failed, reason: Layer4 connection problem, 
info: \"General socket error (Network is unreachable)\", check duration: 
0ms, status: 0/2 DOWN") at src/checks.c:3396

(gdb) next
110 in src/hathreads.c
(gdb) next
109 in src/hathreads.c
(gdb) next
110 in src/hathreads.c
(gdb) next
109 in src/hathreads.c
(gdb) next
110 in src/hathreads.c
(gdb) next
109 in src/hathreads.c
(gdb) next


Command name abbreviations are allowed if unambiguous.
(gdb) thread 1
[Switching to thread 1 (LWP 100660 of process 56253)]
#0  thread_sync_barrier (barrier=0x8bc690 ) 
at src/hathreads.c:109

109 src/hathreads.c: No such file or directory.
(gdb) bt full
#0  thread_sync_barrier (barrier=0x8bc690 ) 
at src/hathreads.c:109

    old = 7
#1  0x005b0038 in thread_enter_sync () at src/hathreads.c:122
    barrier = 1
#2  0x0051737c in sync_poll_loop () at src/haproxy.c:2380
No locals.
#3  0x005172ed in run_poll_loop () at src/haproxy.c:2432
    next = -357534104
    exp = -357534104
#4  0x00514670 in run_thread_poll_loop (data=0x802491380) at 
src/haproxy.c:2462

    start_lock = 0
    ptif = 0x8af8f8 
    ptdf = 0x7fffec80
#5  0x00511199 in main (argc=10, argv=0x7fffec28) at 
src/haproxy.c:3052

    tids = 0x802491380
    threads = 0x8024831a0
    i = 3
    err = 0
    retry = 200
    limit = {rlim_cur = 6116, rlim_max = 6116}
    errmsg = 

1.9dev, lua, socket:settimeout() is not being honored and continuous to wait

2018-05-26 Thread PiBa-NL

Hi Thierry,

There is still 'something' remaining where 'socket:settimout' is not 
honored. See attached script and output below (slightly modified version 
from before.) ran against the same ' ./tcploop 81 L W N20 A R 
S:"response1\r\n" R P6000 S:"response2\r\n" R [ F K ] ' with 6 second 
delay as before)


I've tested this with the previous 2 patches (for the socket-deadlock 
and socket-scheduling) applied against the current master branch.


Should it wait for the response2 when timeout is set to 1.? Or should 
that have produced some sort of error / nil response.?.


As can be seen from below output the second socket:receive() takes the 
complete 6 seconds that tcploop takes to produce its second response. 
Can you take a look? Thanks in advance :).


Regards,

PiBa-NL (Pieter)

---
response1response2

Timing:
[19:42:14.8185]: wait 1 sec
[19:42:15.8632]: create socket
[19:42:15.8632]: set timeout 1 sec
[19:42:15.8632]: connect
[19:42:15.8634]: send
[19:42:15.8634]: receive
[19:42:15.8635]: send
[19:42:15.8635]: receive
[19:42:21.9005]: close
[19:42:21.9005]: done
---

timers = {}
function addlogstamp(text)
  table.insert(timers, {core.now(), text})
end
function timestamp_to_string(t)
  local res = os.date("%T",t.sec).."."..string.format("%04d", t.usec//100)
  return res
end

  addlogstamp("wait 1 sec")
  core.Info("Wait for it..")
  core.sleep(1)

  local result = ""
  addlogstamp("create socket")
con = core.tcp()
  addlogstamp("set timeout 1 sec")
con:settimeout(1)
  addlogstamp("connect")
con:connect("127.0.0.1",81)
  addlogstamp("send")
con:send("Test1\r\n")
  addlogstamp("receive")
r = con:receive("*l")
result = result .. tostring(r)
  addlogstamp("send")
con:send("Test\r\n")
  addlogstamp("receive")
r2 = con:receive("*l")
  addlogstamp("close")
result = result .. tostring(r2)
con:close()
  addlogstamp("done")

result = result .. "\n\nTiming:\n"
for x,t in pairs(timers) do
  result = result ..  "["..timestamp_to_string(t[1]) .. "]: "..t[2] .. "\n"
end
return result

Re: [PATCH] Re: 1.8.8 & 1.9dev, lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-26 Thread PiBa-NL

Hi Thierry,

Op 25-5-2018 om 15:40 schreef Thierry FOURNIER:

On Fri, 18 May 2018 22:17:00 +0200
PiBa-NL <piba.nl@gmail.com> wrote:


Hi Thierry,

Op 18-5-2018 om 20:00 schreef Thierry FOURNIER:

Hi Pieter,

Could you test the attached patch ? It seems to fix the problem, but I
have some doubts about the reliability of the patch.

Thierry

The crash seems 'fixed' indeed.. but the lua scipt below now takes
5seconds instead of 150ms.

Regards,
PiBa-NL (Pieter)

con = core.tcp()
con:connect_ssl("216.58.212.132",443) --google: 216.58.212.132
request = [[GET / HTTP/1.0

]]
con:send(request)
res = con:receive("*a")
con:close()

One bug can hide another bug :-) I catch both. Could you test ?

If the result is positive I join also the backport for 1.6 and 1.7

Thierry


Thanks, seems both the hang and the 2nd uncovered task schedule issue 
are fixed now (google website response is received/processed fast 
again). I've done some testing, and installed the updated/patched 
version on my production box last night. At the moment it still works 
properly. Activated my lua healthchecker and mailer tasks and enabled 3 
threads again.. Lets see how it go's :), but as said for now it seems to 
work alright.


Does the second issue you found and fixed clear the initial 'doubts 
about the reliability' of the first one.? Or did you have a particular 
possibly problematic scenario in mind that i could try and check for?


For the moment i think it is more 'reliable / stable' with the patches 
than without. So in that regard i think they could be merged.


There seems to be a issue with tcp:settimeout() though. But ill put that 
in a different mail-threat as im not sure its related and the issues 
where this thread started are fixed.


Regards,

PiBa-NL (Pieter)




Re: DNS resolver + threads, 100% cpu usage / hang 1.9dev

2018-05-22 Thread PiBa-NL

Hi Olivier,

Op 22-5-2018 om 18:46 schreef Olivier Houchard:

Hi Pieter,

Does the attached patch fix it for you ? It's been generated from master,
but will probably apply against 1.8 as well.

Thanks !

Olivier


Patch works for me (on master, didn't try with 1.8). Or at least i'm 
running the same testbox for an hour now without issue.

Thanks !

Regards,
PiBa-NL (Pieter)




Re: warnings during loading load-server-state, expected?

2018-05-19 Thread PiBa-NL

Hi Daniel,

Op 20-5-2018 om 2:08 schreef Daniel Corbett:

Hi Pieter,

While I'm not sure what may be happening in regards to the 
server-template messages that you have pointed out.


I have ran into the unix socket one a couple weeks ago and have been 
meaning to send this patch to the mailing list.


What is happening is that currently only AF_INET and AF_INET6 are 
checked within the switch statement when dumping the servers state. 
This causes the value of srv_addr to be empty and thus a missing field 
in the server state file.


This patch adds a default case that sets srv_addr to "-" when not 
covered by a socket family.


This should be backported to 1.8

Thanks,
-- Daniel



Works for me to get rid of the unix-socket message.
So 1 issue fixed, and 1 to go on this subject ;)

Regards,
PiBa-NL (Pieter)




DNS resolver + threads, 100% cpu usage / hang 1.9dev

2018-05-19 Thread PiBa-NL

Hi List,

With 1.8.8 ran into this, tried latest 1.9dev snapshot seems to have the 
same issue..


Running with 3 threads, a template for 8 servers, and only 2 ip's in the 
dns response, neither of which is actually 'up' one responds with 
'L4TOUT in 1004ms' the other with 'L4CON in 0ms' on stats page.. DNS 
comes from a local unbound dns-server on the same host as the haproxy 
process. after a little while (10 minutes+-) haproxy go's to 300% usage 
on all the 3 spin-locks and doesn't respond anymore.


There was no actual traffic passing through this machine. I am watching 
the stats page.

Running with 1 thread the issue does not seem to appear.

I think 2 threads are deadlocking each other.?. and then later the 3rd 
joins the waiting game.
I've added logging between a lot of most of the lock/unlock functions.. 
and it seems that the 2 succeeded locks below from lines 332 and 333 in 
attached logfile are the last ones where threads 1 and 0 are doing 
anything..


For example added logging like this everywhere ,the thread-id, the 
servername, and the function its inside of is logged with printf 
statements..:


static inline void health_adjust(struct server *s, short status)
{
    HA_SPIN_LOCK(SERVER_LOCK, >lock);
    printf("tid:%u LOCKED srv: %s health_adjust\n", tid, s->id);
    /* return now if observing nor health check is not enabled */
    if (!s->observe || !s->check.task) {
        printf("tid:%u UNLOCK srv: %s health_adjust a\n", tid, s->id);
        HA_SPIN_UNLOCK(SERVER_LOCK, >lock);
        return;
    }

    __health_adjust(s, status);
    printf("tid:%u UNLOCK srv: %s health_adjust b\n", tid, s->id);
    HA_SPIN_UNLOCK(SERVER_LOCK, >lock);
}

This leads to the following calls that are seemingly never unlocked the 
tid: stops appearing, and actual callstacks go several functions deeper 
that try to gain another lock:


tid:1 LOCKED srv: alias_srv1 process_chk_conn
tid:0 LOCKED srv: alias_srv8 event_srv_chk_w

Then a little later the tid:2 also hangs.. probably it was time to 
perform another check or dns lookup or so...


(gdb) info threads
  Id   Target Id Frame
  1    LWP 100730 of process 15238 0x004ab6bc in 
snr_check_ip_callback (srv=0x8022d0400, ip=0x80288e394,

    ip_family=0x7fffe207 "\002\224\343\210\002\b") at src/server.c:3781
  2    LWP 100924 of process 15238 0x004ab6b4 in 
snr_check_ip_callback (srv=0x8022cb000, ip=0x80288e394,

    ip_family=0x7fffdfffda97 "\002\224\343\210\002\b") at src/server.c:3781
* 3    LWP 100925 of process 15238 0x0051ef21 in 
dns_resolve_recv (dgram=0x8022483c0) at src/dns.c:1646


More gdb 'bt full' output of all 3 threads is at the bottom of the 
attached logfile.


Hope someone can try and fix this :) Config and haproxy -vv also added 
below.


Thanks
PiBa-NL (Pieter)

global
    maxconn            3002
    log            /var/run/log    local0    info
    stats socket /tmp/haproxy.socket level admin  expose-fd listeners
    uid            80
    gid            80
    nbproc            1
    nbthread            3
    hard-stop-after        15m
    chroot                /tmp/haproxy_chroot
    daemon
    tune.ssl.default-dh-param    2048
    log-send-hostname        haproxy-pb-test

    defaults
      # never fail on address resolution last,libc,none
      default-server init-addr last,none

      stats show-legends

    userlist myuserlist
      user admin insecure-password pass

listen HAProxyLocalStats
    bind :80 name localstats
    mode http
    stats enable
    stats refresh 5
    stats admin if TRUE
    stats uri /
stats show-legends
    timeout client 5000
    timeout connect 5000
    timeout server 5000

mailers globalmailers
    mailer pbmail 192.168.0.40:25

resolvers globalresolvers
    nameserver goog 127.0.0.1:53
    resolve_retries 3
    timeout retry 1s
    hold valid 10s

frontend alias_test
    bind            1.2.3.1:21 name 1.2.3.1:21   transparent
    bind            1.2.3.1:1-10002 name 1.2.3.1:1-10002 
transparent

    bind            1.2.3.2:21 name 1.2.3.2:21   transparent
    bind            1.2.3.2:1-10002 name 1.2.3.2:1-10002 
transparent

    bind            1:2:3::3:21 name 1:2:3::3:21   transparent
    bind            1:2:3::3:1-10002 name 1:2:3::3:1-10002 
transparent

    bind            1:2:3::4:21 name 1:2:3::4:21   transparent
    bind            1:2:3::4:1-10002 name 1:2:3::4:1-10002 
transparent

    bind            1.2.3.9:21 name 1.2.3.9:21   transparent
    bind            1.2.3.9:1-10002 name 1.2.3.9:1-10002 
transparent

    mode            http
    log            global
    option            socket-stats
    option            http-keep-alive
    timeout client        3
    default_backend alias_back_http_ipvANY

backend alias_back_http_ipvANY
    mode            http
    id            134
    log            global
    # use mailers
    # level  i

warnings during loading load-server-state, expected?

2018-05-19 Thread PiBa-NL

Hi List,

Is it expected to get warnings on unix-sockets and server-template that 
don't have enough A records.

It seems like its trying to do something wrong..

Im using a global server state file with a defaults that loads it:
global
    server-state-file /tmp/haproxy_server_state
defaults
      load-server-state-from-file global

Server-template for 8 servers where dns returns 2 A records.
    server-template            alias_srv 8 
dns_server_name.pfs.local:809 check inter 1000  resolvers 
globalresolvers init-addr last,libc,none
So servers 2/7 are in maintenance mode without a ip.. should it warn 
about that??:
[WARNING] 138/234928 (87221) : server-state application failed for 
server 'alias_back_http_ipvANY/alias_srv2', invalid srv_admin_state 
value '32'
[WARNING] 138/234928 (87221) : server-state application failed for 
server 'alias_back_http_ipvANY/alias_srv3', invalid srv_admin_state 
value '32'
[WARNING] 138/234928 (87221) : server-state application failed for 
server 'alias_back_http_ipvANY/alias_srv4', invalid srv_admin_state 
value '32'
[WARNING] 138/234928 (87221) : server-state application failed for 
server 'alias_back_http_ipvANY/alias_srv5', invalid srv_admin_state 
value '32'
[WARNING] 138/234928 (87221) : server-state application failed for 
server 'alias_back_http_ipvANY/alias_srv6', invalid srv_admin_state 
value '32'
[WARNING] 138/234928 (87221) : server-state application failed for 
server 'alias_back_http_ipvANY/alias_srv7', invalid srv_admin_state 
value '32'


And the unix socket also has different warnings:
    server            plainhttpsocket /testje.socket 
send-proxy-v2-ssl-cn backup weight 1 resolvers globalresolvers  id 120
[WARNING] 138/203832 (19767) : server-state application failed for 
server 'vhost1_http_ipvANY/plainhttpsocket', invalid srv_iweight value 
'3213', invalid srv_f_forced_id value '-'

Other run:
[WARNING] 138/234743 (12338) : server-state application failed for 
server 'vhost1_http_ipvANY/plainhttpsocket', invalid srv_f_forced_id 
value '-'


Tried with 1.8.8 and 1.9dev.
Should i be doing something differently? Or should i ignore those.?. Or 
is there a way to suppress those messages.?.


Regards,
PiBa-NL (Pieter)



Re: 1.8.8 & 1.9dev, lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-18 Thread PiBa-NL

Hi Thierry,

Op 18-5-2018 om 20:00 schreef Thierry FOURNIER:

Hi Pieter,

Could you test the attached patch ? It seems to fix the problem, but I
have some doubts about the reliability of the patch.

Thierry
The crash seems 'fixed' indeed.. but the lua scipt below now takes 
5seconds instead of 150ms.


Regards,
PiBa-NL (Pieter)

con = core.tcp()
con:connect_ssl("216.58.212.132",443) --google: 216.58.212.132
request = [[GET / HTTP/1.0

]]
con:send(request)
res = con:receive("*a")
con:close()




Re: 1.8.8 & 1.9dev, lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-11 Thread PiBa-NL

Hi Thierry,

Okay found a simple reproduction with tcploop with a 6 second delay in 
there and a short sleep before calling kqueue.


./tcploop 81 L W N20 A R S:"response1\r\n" R P6000 S:"response2\r\n" R [ 
F K ]


 gettimeofday(_poll, NULL);
+    usleep(100);
 status = kevent(kqueue_fd[tid], // int kq

Together with the attached config the issue is reproduced every time the 
/myapplet url is requested.


Output as below:
:stats.clihdr[0007:]: Accept-Language: 
nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7

[info] 130/195936 (76770) : Wait for it..
[info] 130/195937 (76770) : Wait response 2..
  xref_get_peer_and_lock xref->peer == 1

Hope this helps to come up with a solution..

Thanks in advance,
PiBa-NL (Pieter)

Op 9-5-2018 om 19:47 schreef PiBa-NL:

Hi Thierry,

Op 9-5-2018 om 18:30 schreef Thierry Fournier:
It seems a dead lock, but you observe a loop. 
Effectively it is a deadlock, it keeps looping over these few lines of 
code below from xref.h 
<http://git.haproxy.org/?p=haproxy.git;a=blob_plain;f=include/common/xref.h;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b>.. 
The XCHG just swaps the 2 values (both are '1') and continues on, then 
the local==BUSY check is true it loops and swaps 1 and 1 again, and 
the circle continues..


Thanks for looking into it :) Ill try and get 'simpler' reproduction 
with some well placed sleep() as you suggest.

Regards,
PiBa-NL

http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b 



function myapplet(applet)

  core.Info("Wait for it..")
  core.sleep(1)

  local result = ""
con = core.tcp()
con:settimeout(1)
con:connect("127.0.0.1",81)
con:send("Test1\r\n")
r = con:receive("*l")
result = result .. tostring(r)
con:send("Test\r\n")
  core.Info("Wait response 2..")
r2 = con:receive("*l")
result = result .. tostring(r2)
  core.Info("close..")
con:close()
  core.Info("DONE")

response = "Finished"
applet:add_header("Server", "haproxy/webstats")
applet:add_header("Content-Type", "text/html")
applet:start_response()
applet:send(response)

end

core.register_service("myapplet", "http", myapplet)
global
  lua-load /root/haproxytest/hang_timeout_close.lua

defaults
mode http
timeout connect 5s
timeout client 30s
timeout server 60s
  
frontend stats
bind *:80
stats enable
stats admin if TRUE
stats refresh 1s

  acl myapplet path -m beg /myapplet
  http-request use-service lua.myapplet if myapplet

 include/common/xref.h | 4 
 src/ev_kqueue.c   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/include/common/xref.h b/include/common/xref.h
index 6dfa7b6..e6905a1 100644
--- a/include/common/xref.h
+++ b/include/common/xref.h
@@ -25,6 +25,10 @@ static inline void xref_create(struct xref *xref_a, struct 
xref *xref_b)
 
 static inline struct xref *xref_get_peer_and_lock(struct xref *xref)
 {
+   if (xref->peer == 1) {
+   printf("  xref_get_peer_and_lock xref->peer == 1 \n");
+   }
+
struct xref *local;
struct xref *remote;
 
diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index bf7f666..732f20d 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -145,6 +145,7 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
 
fd = global.tune.maxpollevents;
gettimeofday(_poll, NULL);
+   usleep(100);
status = kevent(kqueue_fd[tid], // int kq
NULL,  // const struct kevent *changelist
0, // int nchanges


Re: Eclipse 403 access denied

2018-05-11 Thread PiBa-NL

Hi Norman,

Op 11-5-2018 om 19:36 schreef Norman Branitsky:


After upgrading to the latest version of Eclipse and installing our 
custom Eclipse Plugin,


my developers are now being blocked by HAProxy.

Here’s a sample of the problem:

May 11 15:03:37 localhost haproxy[13089]: 66.192.142.9:43041 
[11/May/2018:15:03:37.932] main_ssl~ 
ssl_backend-etkdev/i-09120e3b
0/0/1/24/25 200 436 - - --NN 52/52/0/0/0 0/0 "GET 
/entellitrak/private/api/workspaces/query/current HTTP/1.1"


May 11 15:03:38 localhost haproxy[13089]: 66.192.142.9:56417 
[11/May/2018:15:03:38.117] main_ssl~ main_ssl/
0/-1/-1/-1/0 403 188 - - PR-- 50/50/0/0/0 0/0 "POST 
/entellitrak/private/api/packages/query/workspace/t.jx HTTP/1.1"




" PR   The proxy blocked the client's HTTP request, either because of an
  invalid HTTP syntax, in which case it returned an HTTP 400 error to
  the client, or because a deny filter matched, in which case it
  returned an HTTP 403 error."


So, is the 403 because the backend server is unknown in the 2^nd request?

Or is the backend server unknown because of the 403?

This is the beginning of the JSON payload in the POST statement:

ID: 24

Address: 
https://etkdev.wisits.org/entellitrak/private/api/packages/query/workspace/thomas.jackson


Http-Method: POST

Content-Type: application/json

Headers: {Authorization=[Basic dGhvbWFzLmphY2tzb246UGFzc3dvcmQxIQ==], 
Content-Type=[application/json], Accept=[application/json]}



Could it be the 'Host' header is missing.? Which is required by http/1.1.
And above authorization can be decoded.. be careful what internal/secure 
information is posted..


Payload: 
["package.fileServer.c0413431-1236-4825-90f1-5f5be131a237","package.rfWorkflowParameterJavascript.a227ee0b-6b59-4643-b1f8-1ff203948a24",


HAProxy version info:

[WIIRIS-LB-240]# /usr/local/sbin/haproxy -vv

HA-Proxy version 1.7.9 2017/08/18

Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>

Build options :

  TARGET  = linux2628

  CPU = generic

  CC  = gcc

  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement 
-fwrapv


  OPTIONS = USE_SLZ=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :

  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes

Built with libslz for stateless compression.

Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")


Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013

Running on OpenSSL version : OpenSSL 1.0.2l  25 May 2017 (VERSIONS 
DIFFER!)



p.s. Running with different versions between build/running is a bad thing..

Regards,

PiBa-NL



[PATCH] BUG/MEDIUM: pollers/kqueue: use incremented position in event list

2018-05-09 Thread PiBa-NL

Hi Olivier,

Please take a look at attached patch. When adding 2 fd's the second 
overwrote the first one.
Tagged it medium as haproxy just didn't work at all. (with kqueue.). 
Though it could perhaps also be minor, as the commit has only been done 
recently.?. Anyhow.. This seems to fix it :). Tried to go with a 'int 
reference pointer' but couldn't find the best syntax to do it that way, 
and this seems clean enough as well.. Though if you think a different 
approach (thread local int  ?) or any other way is better please change 
it or advice :)


Issue was introduced here: 
http://git.haproxy.org/?p=haproxy.git;a=commit;h=6b96f7289c2f401deef4bdc6e20792360807dde4


Thanks,
PiBa-NL (Pieter)
From 3c60fdace2f23a8c6d070c8fafab660becb8c514 Mon Sep 17 00:00:00 2001
From: PiBa-NL <piba.nl@gmail.com>
Date: Thu, 10 May 2018 01:01:28 +0200
Subject: [PATCH] BUG/MEDIUM: pollers/kqueue: use incremented position in event
 list

When composing the event list for subscribe to kqueue events, the index 
where the new event is added must be after the previous events, as such 
the changes counter should continue counting.

This caused haproxy to accept connections but not try read and process 
the incoming data.

This patch is for 1.9 only
---
 src/ev_kqueue.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 926f77c..09e191c 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -33,10 +33,10 @@ static int kqueue_fd[MAX_THREADS]; // per-thread kqueue_fd
 static THREAD_LOCAL struct kevent *kev = NULL;
 static struct kevent *kev_out = NULL; // Trash buffer for kevent() to write 
the eventlist in
 
-static int _update_fd(int fd)
+static int _update_fd(int fd, int start)
 {
int en;
-   int changes = 0;
+   int changes = start;
 
en = fdtab[fd].state;
 
@@ -91,7 +91,7 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
activity[tid].poll_drop++;
continue;
}
-   changes += _update_fd(fd);
+   changes = _update_fd(fd, changes);
}
/* Scan the global update list */
for (old_fd = fd = update_list.first; fd != -1; fd = 
fdtab[fd].update.next) {
@@ -109,7 +109,7 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
continue;
if (!fdtab[fd].owner)
continue;
-   changes += _update_fd(fd);
+   changes = _update_fd(fd, changes);
}
 
if (changes) {
-- 
2.10.1.windows.1



Re: 1.8.8 & 1.9dev, lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-09 Thread PiBa-NL

Hi Thierry,

Op 9-5-2018 om 18:30 schreef Thierry Fournier:
It seems a dead lock, but you observe a loop. 
Effectively it is a deadlock, it keeps looping over these few lines of 
code below from xref.h 
<http://git.haproxy.org/?p=haproxy.git;a=blob_plain;f=include/common/xref.h;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b>.. 
The XCHG just swaps the 2 values (both are '1') and continues on, then 
the local==BUSY check is true it loops and swaps 1 and 1 again, and the 
circle continues..


Thanks for looking into it :) Ill try and get 'simpler' reproduction 
with some well placed sleep() as you suggest.

Regards,
PiBa-NL

http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b

31 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l31> 
while (1) {
32 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l32> 

33 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l33> 
/* Get the local pointer to the peer. */
34 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l34> 
local = HA_ATOMIC_XCHG(>peer, XREF_BUSY);
35 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l35> 

36 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l36> 
/* If the local pointer is NULL, the peer no longer exists. */
37 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l37> 
if (local == NULL) {
38 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l38> 
xref->peer = NULL;
39 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l39> 
return NULL;
40 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l40> 
}
41 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l41> 

42 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l42> 
/* If the local pointeru is BUSY, the peer try to acquire the
43 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l43> 
 * lock. We retry the process.
44 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l44> 
 */
45 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l45> 
if (local == XREF_BUSY)
46 
<http://git.haproxy.org/?p=haproxy.git;a=blob;f=include/common/xref.h;h=6dfa7b62758dfaebe12d25f66aaa858dc873a060;hb=29d698040d6bb56b29c036aeba05f0d52d8ce94b#l46> 
continue;




Re: 1.8.8 & 1.9dev, lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-07 Thread PiBa-NL

Hi List, Thierry,

Actually this is not limited to restarts, and also happens with 1.9dev. 
It now happens while haproxy was running for a while and no restart was 
attempted while running/debugging in my NetBeans IDE..


Root cause imo is that hlua_socket_receive_yield and hlua_socket_release 
both try and acquire the same lock.



For debugging purposes ive added some code in 
hlua_socket_receive_yield(..) before the stream_int_notify:


    struct channel *ic2 = si_ic(si);
    struct channel *oc2 = si_oc(si);
    ha_warning("hlua_socket_receive_yield calling notify peer:%9x 
si[0].state:%d oc2.flag:%09x ic2.flag:%09x\n", peer, s->si[0].state, 
oc2->flags, ic2->flags);

    stream_int_notify(>si[0]);

And:
static void hlua_socket_release(struct appctx *appctx)
{
    struct xref *peer;
    if (appctx->ctx.hlua_cosocket.xref.peer > 1)
        ha_warning("hlua_socket_release peer: %9x %9x\n", 
appctx->ctx.hlua_cosocket.xref, appctx->ctx.hlua_cosocket.xref.peer->peer);

    else
        ha_warning("hlua_socket_release peer: %9x 0\n", 
appctx->ctx.hlua_cosocket.xref);



And also added code in xref_get_peer_and_lock(..):
static inline struct xref *xref_get_peer_and_lock(struct xref *xref)
{
    if (xref->peer == 1) {
        printf("  xref_get_peer_and_lock xref->peer == 1 \n");
    }


This produces the logging:

[WARNING] 127/001127 (36579) : hlua_socket_receive_yield calling notify 
peer:  2355590  si[0].state:7 oc2.flag:0c000c220 ic2.flag:00084a024

[WARNING] 127/001127 (36579) : hlua_socket_release peer: 1 0
  xref_get_peer_and_lock xref->peer == 1

When xref_get_peer_and_lock is called with a parameter xref->peer value 
of 1 then it looks like it keeps swapping 1 and 1 until it is not 1, 
that never happens..


As for the oc2.flags it contains the CF_SHUTW_NOW.. of which im still 
not 100% when that flag is exactly set to get a foolproof reproduction.. 
but it happens on pretty much a daily basis for me in production and in 
test i can now usually trigger it after a few testruns with no actual 
traffic passing along within the first minute of running (healthchecks 
are performed on several backend, and a mail or 2 is send by the lua 
code during this startup period..).. with the full production config..


Below the stacktrace that comes with it..

xref_get_peer_and_lock (xref=0x802355590) at 
P:\Git\haproxy\include\common\xref.h:37

hlua_socket_release (appctx=0x802355500) at P:\Git\haproxy\src\hlua.c:1595
si_applet_release (si=0x8023514c8) at 
P:\Git\haproxy\include\proto\stream_interface.h:233
stream_int_shutw_applet (si=0x8023514c8) at 
P:\Git\haproxy\src\stream_interface.c:1504
si_shutw (si=0x8023514c8) at 
P:\Git\haproxy\include\proto\stream_interface.h:320
stream_int_notify (si=0x8023514c8) at 
P:\Git\haproxy\src\stream_interface.c:465
hlua_socket_receive_yield (L=0x80223b388, status=1, ctx=0) at 
P:\Git\haproxy\src\hlua.c:1789

?? () at null:
?? () at null:
lua_resume () at null:
hlua_ctx_resume (lua=0x8022cb800, yield_allowed=1) at 
P:\Git\haproxy\src\hlua.c:1022

hlua_process_task (task=0x80222a500) at P:\Git\haproxy\src\hlua.c:5556
process_runnable_tasks () at P:\Git\haproxy\src\task.c:232
run_poll_loop () at P:\Git\haproxy\src\haproxy.c:2401
run_thread_poll_loop (data=0x802242080) at P:\Git\haproxy\src\haproxy.c:2463
main (argc=4, argv=0x7fffea80) at P:\Git\haproxy\src\haproxy.c:3053

I don't yet have a any idea about the direction of a possible fix.. :(..
Issue is that probably the hlua_socket_release should happen, but it 
doesnt know what socket / peer it should release at that point.. its in 
the local peer variable of the hlua_socket_receive_yield funtion.. 
Should it be 'unlocked' before calling stream_int_notify??


Does anyone dare to take a stab at a creating a patch ? If so thanks in 
advance ;)


Regards,
PiBa-NL (Pieter)


Op 3-5-2018 om 1:30 schreef PiBa-NL:

Hi List,

Sometimes after a few 'restarts' of haproxy 1.8.8 (using -sf  
parameter) one of the processes seems to get into a 'hanging' state 
consuming 100% cpu..


In this configuration i'm using 'nbthread 1' not sure if this is 
related to the corrupted task-tree from my other lua issue.?. 
https://www.mail-archive.com/haproxy@formilux.org/msg29801.html .?.


Also i'm using my new smtpmailqueue and serverhealthchecker lua 
scripts (can be found on github.).. these probably 'contribute' to 
triggering the condition.


Anything i can check / provide.?

(cant really share the config itself a.t.m. as its from our production 
env, but it has like 15 backends with 1 server each, a little header 
rewriting/insertion but nothing big..)


GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  T

[PATCH] BUG/MINOR: lua: schedule socket task when lua connect() is called

2018-05-05 Thread PiBa-NL

Hi List, Thierry, Willy,

Created another little patch. Hope this one fits all submission criteria.

Regards,
PiBa-NL (Pieter)
From cc4adb62c55f268e9e74853f4a4893e2a3734aec Mon Sep 17 00:00:00 2001
From: PiBa-NL <piba.nl@gmail.com>
Date: Sat, 5 May 2018 23:51:42 +0200
Subject: [PATCH] BUG/MINOR: lua: schedule socket task upon lua connect()

The parameters like server-address, port and timeout should be set before
process_stream task is called to avoid the stream being 'closed' before it
got initialized properly. This is most clearly visible when running with
tune.lua.forced-yield=1.. So scheduling the task should not be done when 
creating the lua socket, but when connect is called. The error 
"socket: not yet initialised, you can't set timeouts." would then appear.

Below code for example also shows this issue, as the sleep will
yield the lua code:
  local con = core.tcp()
  core.sleep(1)
  con:settimeout(10)
---
 src/hlua.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/hlua.c b/src/hlua.c
index 4c56409..07366af 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -2423,6 +2423,10 @@ __LJMP static int hlua_socket_connect(struct lua_State 
*L)
WILL_LJMP(luaL_error(L, "out of memory"));
}
xref_unlock(>xref, peer);
+   
+   task_wakeup(s->task, TASK_WOKEN_INIT);
+   /* Return yield waiting for connection. */
+
WILL_LJMP(hlua_yieldk(L, 0, 0, hlua_socket_connect_yield, 
TICK_ETERNITY, 0));
 
return 0;
@@ -2582,8 +2586,6 @@ __LJMP static int hlua_socket_new(lua_State *L)
strm->flags |= SF_DIRECT | SF_ASSIGNED | SF_ADDR_SET | SF_BE_ASSIGNED;
strm->target = _tcp.obj_type;
 
-   task_wakeup(strm->task, TASK_WOKEN_INIT);
-   /* Return yield waiting for connection. */
return 1;
 
  out_fail_stream:
-- 
2.10.1.windows.1



Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-05-04 Thread PiBa-NL

Hi Oliver,

Op 4-5-2018 om 15:54 schreef Olivier Houchard:

Hi Pieter,

Thanks a lot for the detailed analysis. That seems spot on.
We decided to do something a bit different than your proposed fix.
Does the attached patch fix your problems ?

Thanks again !

Olivier


Thanks for this fix, it also works for me, regarding the crash that 
happened.


As for the partial print_r() result this thread started with, (but is 
less important than fixing the crash), I guess after the client is 
done/disconnected the service task just stops running is 'by design'.? 
Anyhow its not a real big issue i guess i just didn't expect it.


Regards,

PiBa-NL




Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-05-03 Thread PiBa-NL

Hi Thierry,

Op 3-5-2018 om 8:59 schreef Thierry Fournier:

he bug. I even installed a FreeBSD:-)  I add Willy in
copy, maybe he will reproduce it.

Thierry


The 'trick' is probably sending as few requests as possible through a 
'high latency' vpn (17ms for a ping from client to haproxy machine..).


Haproxy startup.
    Line 17: :TestSite.clireq[0007:]: GET 
/haproxy?stats HTTP/1.1
    Line 34: 0002:TestSite.clireq[0007:]: GET /favicon.ico 
HTTP/1.1
    Line 44: 0001:TestSite.clireq[0008:]: GET 
/webrequest/mailstat HTTP/1.1
    Line 133: 0003:TestSite.clireq[0008:]: GET 
/webrequest/mailstat HTTP/1.1
    Line 220: 0004:TestSite.clireq[0008:]: GET /favicon.ico 
HTTP/1.1
    Line 233: 0005:TestSite.clireq[0008:]: GET 
/haproxy?stats HTTP/1.1
    Line 251: 0006:TestSite.clireq[0008:]: GET 
/webrequest/mailstat HTTP/1.1

Crash..

Sometimes it takes a few more but its not really consistently.. Its 
rather timing sensitive i guess..



But besides the reproduction, how is the theory behind the tasks and 
their cleanup how 'should' it work?
Chrome browser makes a few requests to haproxy for stats page and the 
other for the and lua service (and a favicon in between.)..


At one point in time the tcp connection for the lua service gets closed 
and the process_stream starts to call the si_shutw.. a few calls deeper 
hlua_applet_http_release removes the http task from the list..


static void hlua_applet_http_release(struct appctx *ctx)
{
    task_delete(ctx->ctx.hlua_apphttp.task);
    task_free(ctx->ctx.hlua_apphttp.task);

Then when the current task is 'done' it will move to the next one.. the 
rq_next in the process loop..that however is pointing to the 
deleted/freed hlua_apphttp.task..?.. So getting the next task from that 
already destroyed element will fail...


Perhaps something like the patch below could work?
Does it make sense? (Same should then be done for tcp and cli tasks i 
guess..)
For my testcase it doesn't crash anymore with that change. But i'm not 
sure if now its leaking memory instead for some cases.. Is there a easy 
way to check?


Regards,
PiBa-NL (Pieter)


diff --git a/src/hlua.c b/src/hlua.c
index 4c56409..6515f52 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -6635,8 +6635,7 @@ error:

 static void hlua_applet_http_release(struct appctx *ctx)
 {
-    task_delete(ctx->ctx.hlua_apphttp.task);
-    task_free(ctx->ctx.hlua_apphttp.task);
+    ctx->ctx.hlua_apphttp.task->process = NULL;
 ctx->ctx.hlua_apphttp.task = NULL;
 hlua_ctx_destroy(ctx->ctx.hlua_apphttp.hlua);
 ctx->ctx.hlua_apphttp.hlua = NULL;
diff --git a/src/task.c b/src/task.c
index fd9acf6..d6ab0b9 100644
--- a/src/task.c
+++ b/src/task.c
@@ -217,6 +217,13 @@ void process_runnable_tasks()
         t = eb32sc_entry(rq_next, struct task, rq);
         rq_next = eb32sc_next(rq_next, tid_bit);
         __task_unlink_rq(t);
+            if (!t->process) {
+                // task was 'scheduled' to be destroyed (for example a 
hlua_apphttp.task).

+                task_delete(t);
+                task_free(t);
+                continue;
+            }
+
         t->state |= TASK_RUNNING;
         t->pending_state = 0;





Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-03 Thread PiBa-NL

Hi Tim, Willy,

Apparently even a simple copy/paste is to difficult for me to do right 
sometimes really sorry about that.. :/

Thanks for merging and explaining, ill try and do better next time :)

Regards,
PiBa-NL



1.8.8 lua, xref_get_peer_and_lock hang / 100% cpu usage after restarting haproxy a few times

2018-05-02 Thread PiBa-NL

Hi List,

Sometimes after a few 'restarts' of haproxy 1.8.8 (using -sf  
parameter) one of the processes seems to get into a 'hanging' state 
consuming 100% cpu..


In this configuration i'm using 'nbthread 1' not sure if this is related 
to the corrupted task-tree from my other lua issue.?. 
https://www.mail-archive.com/haproxy@formilux.org/msg29801.html .?.


Also i'm using my new smtpmailqueue and serverhealthchecker lua scripts 
(can be found on github.).. these probably 'contribute' to triggering 
the condition.


Anything i can check / provide.?

(cant really share the config itself a.t.m. as its from our production 
env, but it has like 15 backends with 1 server each, a little header 
rewriting/insertion but nothing big..)


GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 


This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/haproxy...done.
Attaching to program: /usr/local/sbin/haproxy, process 68580
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols 
found)...done.

Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols 
found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols 
found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging 
symbols found)...done.

Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols 
found)...done.

[Switching to LWP 100340 of process 68580]
0x0044890b in xref_get_peer_and_lock (xref=0x80254b680) at 
include/common/xref.h:34

34  include/common/xref.h: No such file or directory.
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37  in include/common/xref.h
(gdb) next
45  in include/common/xref.h
(gdb) next
46  in include/common/xref.h
(gdb) next
34  in include/common/xref.h
(gdb) next
37   

Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-02 Thread PiBa-NL

Hi Tim,

Op 3-5-2018 om 0:26 schreef Tim Düsterhus:

Pieter,

Am 02.05.2018 um 23:54 schrieb PiBa-NL:

If commit message needs tweaking please feel free to do so :).


obviously not authoritative for this, but I noticed directly that the
first line of your message is very long. It should generally be about 60
characters, otherwise it might get truncated. The following lines should
be no more than 76 characters to avoid wrapping with the common terminal
width of 80.

Also after the severity and the component a colon should be used, not a
comma.

I suggest something like this for the first line. It is succinct and
gives a good idea of what the bug might me. But please check whether I
grasped the issue properly.

BUG/MINOR: lua: Put tasks to sleep when waiting for data

Best regards
Tim Düsterhus


Thanks, valid comments, the colons i should have noticed could have 
sworn i did those correctly.. but no ;). i thought i carefully 
constructed the commit message looking at a few others, while cramming 
in as much 'info' about the change/issue as possible on a line (better 
than 'fixed #123' as the whole subject as some committers do in 
a other project i contribute to., but thats off-topic.). As for line 
lengths, it didn't even cross my mind to look at that. (i also didn't 
lookup the contributing either sorry, though it doesn't seem to specify 
specific line lengths.?.)


Anyhow changed the title as suggested, it still covers the change. And 
adjusted line wrapping in the message to not exceed 76.


Regards,

PiBa-NL (Pieter)

From a0b01cdc8ccc4ae95c5c03bc98bf859b6115d2f9 Mon Sep 17 00:00:00 2001
From: PiBa-NL <piba.nl@gmail.com>
Date: Wed, 2 May 2018 22:27:14 +0200
Subject: [PATCH] BUG/MINOR, BUG/MINOR: lua: Put tasks to sleep when waiting for 
data

If a lua socket is waiting for data it currently spins at 100% cpu usage.
This because the TICK_ETERNITY returned by the socket is ignored when 
setting the 'expire' time of the task.

Fixed by removing the check for yields that return TICK_ETERNITY.

This should be backported to at least 1.8.
---
 src/hlua.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/hlua.c b/src/hlua.c
index 32199c9..4c56409 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -5552,8 +5552,7 @@ static struct task *hlua_process_task(struct task *task)
 
case HLUA_E_AGAIN: /* co process or timeout wake me later. */
notification_gc(>com);
-   if (hlua->wake_time != TICK_ETERNITY)
-   task->expire = hlua->wake_time;
+   task->expire = hlua->wake_time;
break;
 
/* finished with error. */
-- 
2.10.1.windows.1



[PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-02 Thread PiBa-NL

Hi List, WiIly, Thierry, Emeric,

Tried a little patch for my 100% cpu usage issue.
https://www.mail-archive.com/haproxy@formilux.org/msg29762.html

It stops the cpu usage reported in above thread.. Just wondering if 
there are any culprits that might now 'hang' a lua applet instead.?.


I think this issue was introduced here (haven't tried to bisect it..): 
http://git.haproxy.org/?p=haproxy.git;a=commitdiff;h=253e53e661c49fb9723535319cf511152bf09bc7


Possibly due to some TICK_ETERNITY that shouldn't actually wait a 
eternity.?.


Thoughts are welcome :) Or perhaps the 'all okay' so it can be merged.?
If commit message needs tweaking please feel free to do so :).

Regards,
PiBa-NL (Pieter)
From a0b01cdc8ccc4ae95c5c03bc98bf859b6115d2f9 Mon Sep 17 00:00:00 2001
From: PiBa-NL <piba.nl@gmail.com>
Date: Wed, 2 May 2018 22:27:14 +0200
Subject: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for
 io suspend until woken up by the a corresponding event.

If a lua socket is waiting for data it currently spins at 100% cpu usage. This 
because the TICK_ETERNITY returned by the socket is ignored when setting the 
'expire' time of the task.

Fixed by removing the check for yields that return TICK_ETERNITY.

This should be backported to at least 1.8.
---
 src/hlua.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/hlua.c b/src/hlua.c
index 32199c9..4c56409 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -5552,8 +5552,7 @@ static struct task *hlua_process_task(struct task *task)
 
case HLUA_E_AGAIN: /* co process or timeout wake me later. */
notification_gc(>com);
-   if (hlua->wake_time != TICK_ETERNITY)
-   task->expire = hlua->wake_time;
+   task->expire = hlua->wake_time;
break;
 
/* finished with error. */
-- 
2.10.1.windows.1



Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-04-27 Thread PiBa-NL

Hi Thierry,

Op 27-4-2018 om 1:54 schreef PiBa-NL:

Hi Thierry,

Op 26-4-2018 om 12:25 schreef thierry.fourn...@arpalert.org:

Your trace shows a corrupted tree. Maybe it is due to the freebsd
architecture and the corruption is no reproductible on Linux ? I do 
not have

freebsd for testing.

Regards,
Thierry


My 'best' reproduction scenario involves around 50 to 100 request or 
sometimes less requests total made from my Chrome browser. (over a vpn 
connection which adds a little latency.. a ping/reply to the haproxy 
server takes +-17ms , maybe that helps reproduce.. :/ )
I've changed the lua function 'print_r' to not write anything in its 
wr() function, to rule out the the freebsd console / ssh session to be 
causing the issue. It stays the same though.


Also been adding some printf statements to how the eb_tree is used, 
which i think might be interesting..

    printf("eb32sc_insert(%d)\n",new);
    printf("eb32sc_next(%d)  leaf_p: %d \n",eb32, eb32->node.leaf_p);
    printf("eb32sc_delete(%d)\n",eb32);

The pattern i see here is that usually a task is inserted, the next 
task is looked up, and the task gets deleted again.
Last round before the crash i see that the task is inserted, then 
deleted and then afterwards the next task is being retrieved for that 
same 'item'.. Which fails.. Perhaps because it was the last task to 
run.?. And there is nolonger a higher root to jump higher up the 
tree.?. I must admit i don't fully grasp how the tree is traversed 
exactly.. I seem to see that at least for the first task to be put 
into the tree there is some special handling.. On the other side, if 
it aint in the tree, is there technically still a 'next' item??


Below the last part of that logging, and also attached the complete 
log from the start.. Perhaps it gives a clue.?.


Regards,

PiBa-NL (Pieter)

eb32sc_insert(35826016)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35826016)  leaf_p: 8606272
eb32sc_delete(35826016)
task_wakeup  35826016  32
active_tasks_mask |= t->thread_mask  (35826016)
eb32sc_insert(35826016)
task_wakeup  35826656  8
active_tasks_mask |= t->thread_mask  (35826656)
eb32sc_insert(35826656)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35826656)  leaf_p: 35826656
eb32sc_delete(35826656)
0013:TestSite.srvrep[0006:]: HTTP/1.1 200 OK
0013:TestSite.srvhdr[0006:]: Refresh: 1
0013:TestSite.srvhdr[0006:]: Server: haproxy/webstats
0013:TestSite.srvhdr[0006:]: Content-Type: text/html
0013:TestSite.srvhdr[0006:]: Content-Length: 1778
eb32sc_delete(35826016)
0013:TestSite.srvcls[0006:]
eb32sc_next(35826016)  leaf_p: 0
Segmentation fault (core dumped)

Tried to dig a little further.. pretty sure this are the steps to the 
issue. The exact reproduction probably is rather timing sensitive though.
A tcp connection 'stream task' gets closed, before the applet was done 
running and this removes the applet task from the tree. This applet task 
is however what the 'rq_next' is pointing to..


Below stack seems to be what removes and frees the applet task.. Not 
exactly sure if this is the exact one causing the crash a little later 
its not related to below 'log' anyhow..


hlua_applet_http_release (ctx=0x802238780) at P:\Git\haproxy\src\hlua.c:6668
si_applet_release (si=0x8022accf0) at 
P:\Git\haproxy\include\proto\stream_interface.h:234
stream_int_shutw_applet (si=0x8022accf0) at 
P:\Git\haproxy\src\stream_interface.c:1506
si_shutw (si=0x8022accf0) at 
P:\Git\haproxy\include\proto\stream_interface.h:321

process_stream (t=0x80222a8c0) at P:\Git\haproxy\src\stream.c:2161
process_runnable_tasks () at P:\Git\haproxy\src\task.c:236
run_poll_loop () at P:\Git\haproxy\src\haproxy.c:2404
run_thread_poll_loop (data=0x8022420a0) at P:\Git\haproxy\src\haproxy.c:2469
main (argc=5, argv=0x7fffea68) at P:\Git\haproxy\src\haproxy.c:3060


    *    Below some printf output, mostly containing function names, 
and a few extra inside the process_runnable_tasks loop to show what was 
the current and next task...

active_tasks_mask |= t->thread_mask  (35827936)
eb32sc_insert(35827936)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
find rq_next, current task 35827936  rq_current:35827936
eb32sc_next(35827936)  leaf_p: 8610368
eb32sc_delete(35827936)
process task 35827936  rq_next:0
task_wakeup  35827936  32
active_tasks_mask |= t->thread_mask  (35827936)
eb32sc_insert(35827936)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
find rq_next, current task 35827936  rq_current:35827936
eb32sc_next(35827936)  leaf_p: 8610368
eb32sc_delete(35827936)
process task 35827936  rq_next:0
task_wakeup  35827936  32
active_tasks_mask |= t->thread_mask  (35827936)
eb32sc_insert(35827936)
task_wakeup  35827616  8
active_tasks_mask |= t->thread_mask  (35827616)
eb32sc_insert(35827616)
process_runnable_tasks() active_tasks_mask &

Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-04-26 Thread PiBa-NL

Hi Thierry,

Op 26-4-2018 om 12:25 schreef thierry.fourn...@arpalert.org:

Your trace shows a corrupted tree. Maybe it is due to the freebsd
architecture and the corruption is no reproductible on Linux ? I do not have
freebsd for testing.

Regards,
Thierry


My 'best' reproduction scenario involves around 50 to 100 request or 
sometimes less requests total made from my Chrome browser. (over a vpn 
connection which adds a little latency.. a ping/reply to the haproxy 
server takes +-17ms , maybe that helps reproduce.. :/ )
I've changed the lua function 'print_r' to not write anything in its 
wr() function, to rule out the the freebsd console / ssh session to be 
causing the issue. It stays the same though.


Also been adding some printf statements to how the eb_tree is used, 
which i think might be interesting..

    printf("eb32sc_insert(%d)\n",new);
    printf("eb32sc_next(%d)  leaf_p: %d \n",eb32, eb32->node.leaf_p);
    printf("eb32sc_delete(%d)\n",eb32);

The pattern i see here is that usually a task is inserted, the next task 
is looked up, and the task gets deleted again.
Last round before the crash i see that the task is inserted, then 
deleted and then afterwards the next task is being retrieved for that 
same 'item'.. Which fails.. Perhaps because it was the last task to 
run.?. And there is nolonger a higher root to jump higher up the tree.?. 
I must admit i don't fully grasp how the tree is traversed exactly.. I 
seem to see that at least for the first task to be put into the tree 
there is some special handling.. On the other side, if it aint in the 
tree, is there technically still a 'next' item??


Below the last part of that logging, and also attached the complete log 
from the start.. Perhaps it gives a clue.?.


Regards,

PiBa-NL (Pieter)

eb32sc_insert(35826016)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35826016)  leaf_p: 8606272
eb32sc_delete(35826016)
task_wakeup  35826016  32
active_tasks_mask |= t->thread_mask  (35826016)
eb32sc_insert(35826016)
task_wakeup  35826656  8
active_tasks_mask |= t->thread_mask  (35826656)
eb32sc_insert(35826656)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35826656)  leaf_p: 35826656
eb32sc_delete(35826656)
0013:TestSite.srvrep[0006:]: HTTP/1.1 200 OK
0013:TestSite.srvhdr[0006:]: Refresh: 1
0013:TestSite.srvhdr[0006:]: Server: haproxy/webstats
0013:TestSite.srvhdr[0006:]: Content-Type: text/html
0013:TestSite.srvhdr[0006:]: Content-Length: 1778
eb32sc_delete(35826016)
0013:TestSite.srvcls[0006:]
eb32sc_next(35826016)  leaf_p: 0
Segmentation fault (core dumped)

root@freebsd11:~/.netbeans/remote/192.168.8.93/pb3-Windows-x86_64/P/Git/haproxy 
# ./haproxy -f /home/thierry/git/haproxy/bug29.conf -d -dM0x55
Note: setting global.maxconn to 2000.
Available polling systems :
   poll : pref=200,  test result OK
 select : pref=150,  test result FAILED
 kqueue : disabled,  test result OK
Total: 3 (1 usable), will use poll.

Available filters :
[TRACE] trace
[COMP] compression
[SPOE] spoe
Using poll() as the polling mechanism.
task_wakeup  35824896  4
active_tasks_mask |= t->thread_mask  (35824896)
eb32sc_insert(35824896)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35824896)  leaf_p: 8606272
eb32sc_delete(35824896)
task_wakeup  35824896  8
active_tasks_mask |= t->thread_mask  (35824896)
eb32sc_insert(35824896)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35824896)  leaf_p: 8606272
eb32sc_delete(35824896)
task_wakeup  35825216  4
active_tasks_mask |= t->thread_mask  (35825216)
eb32sc_insert(35825216)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35825216)  leaf_p: 8606272
eb32sc_delete(35825216)
task_wakeup  35825216  8
active_tasks_mask |= t->thread_mask  (35825216)
eb32sc_insert(35825216)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35825216)  leaf_p: 8606272
eb32sc_delete(35825216)
[WARNING] 116/014322 (38016) : Server myservers/localSRVc is DOWN, reason: 
Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 
active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in 
queue.
task_wakeup  35824896  4
active_tasks_mask |= t->thread_mask  (35824896)
eb32sc_insert(35824896)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35824896)  leaf_p: 8606272
eb32sc_delete(35824896)
task_wakeup  35824896  8
active_tasks_mask |= t->thread_mask  (35824896)
eb32sc_insert(35824896)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35824896)  leaf_p: 8606272
eb32sc_delete(35824896)
task_wakeup  35825216  4
active_tasks_mask |= t->thread_mask  (35825216)
eb32sc_insert(35825216)
process_runnable_tasks() active_tasks_mask &= ~tid_bit
eb32sc_next(35825216)  leaf_

1.9dev LUA core.tcp() socket:receive("*l") takes 100% cpu usage

2018-04-25 Thread PiBa-NL

Hi Thierry,

Found another issue, I noticed when sending mails there were spikes of 
high cpu usage..


Below function reproduces this. between 'task start' and 'task end' 1 
cpu core is at 100%.
I used a IP of some google.com webserver when testing but i guess any 
webserver or mailserver that keeps the connection open and doesn't send 
the expected newline will do.


Thousands of wakeup events happen with kqueue (there is nothing 
triggering it though.. the timeout is set to 0 and returns directly.).:

kevent(3,0x0,0,{ },200,{ 1.0 })         = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 1.0 })         = 0 (0x0)
kevent(3,0x0,0,{ },200,{ 1.0 })         = 0 (0x0)

Though the same effect is seen with poll:
poll({ 3/POLLIN 4/POLLIN 6/POLLIN },3,0)     = 0 (0x0)
poll({ 3/POLLIN 4/POLLIN 6/POLLIN },3,0)     = 0 (0x0)
poll({ 3/POLLIN 4/POLLIN 6/POLLIN },3,0)     = 0 (0x0)

It seems like "TICK_ETERNITY" schedules the task to immediately execute 
again.?. While it sounds like it should do the opposite.. But initially 
the commit messages seems the !=eternity check when setting expired is 
being done to avoid hanging tasks.. Perhaps some middle ground is possible?


Regards,

PiBa-NL (Pieter)

mytask = function()
    core.sleep(10)
    core.Info("TASK start")
    local mailconnection = core.tcp()
    mailconnection:settimeout(60)
    mailconnection = mailconnection
    ret = mailconnection:connect("127.0.0.1","80")
    repeat
        receive = mailconnection:receive("*l")
        if receive == nil then
            break
        end
        core.Info("TASK reply:"..receive)
    until false
    core.Info("TASK end")
end
core.register_task(mytask)




Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-04-25 Thread PiBa-NL

Hi Thierry,

Op 25-4-2018 om 11:19 schreef Thierry Fournier:

I extracted the part which dumps the ‘core.get_info()’, and I can’t reproduce
the segfault. I attach the extracted code. I use le lastest master branch.


I'm testing on master branch as well.
I started over with the extracted bug29.conf and bug29.lua configuration 
and the coredump still happens for me.


I do request all pages below more or less simultaneously with Chrome 
browser, and then after a few seconds the crash happens..

  http://haproxy:8000/webrequest/
  http://haproxy:8000/webrequest/
  http://haproxy:8000/webrequest/
  http://haproxy:8000/haproxy?stats

I've also requested both urls with 'wrk' and then it does *not* crash 
after several thousand requests to both.. There is something strange 
there.. Though chrome does of course request the /favicon.ico sends way 
more headers..


FYI im using :
FreeBSD freebsd11 11.1-RELEASE FreeBSD 11.1-RELEASE #0 r321309: Fri Jul 
21 02:08:28 UTC 2017 
r...@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
Also when using nokqueue the issue still happens.. (for some things that 
helps.. not today.)


Attached the exact sequence of events when it happened fast. And the 
coredump backtrace belonging with that output.
Anything i can try to narrow it down further.? Or perhaps leave it for 
now.?. as long as i dont output tons of info on console after the last 
tcp.send it seams to work okay for now..


Regards,
PiBa-NL (Pieter)
root@freebsd11:~/.netbeans/remote/192.168.8.93/pb3-Windows-x86_64/P/Git/haproxy 
# ./haproxy -f /home/thierry/git/haproxy/bug29.conf -d
Note: setting global.maxconn to 2000.
Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result FAILED
Total: 3 (2 usable), will use kqueue.

Available filters :
[TRACE] trace
[COMP] compression
[SPOE] spoe
Using kqueue() as the polling mechanism.
[WARNING] 114/212425 (14449) : Server myservers/localSRVa is DOWN, reason: 
Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 
active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in 
queue.
[WARNING] 114/212427 (14449) : Server myservers/localSRVc is DOWN, reason: 
Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 
active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in 
queue.
:TestSite.accept(0004)=0007 from [192.168.8.116:7545] ALPN=
0001:TestSite.accept(0004)=0008 from [192.168.8.116:7548] ALPN=
0002:TestSite.accept(0004)=0009 from [192.168.8.116:7550] ALPN=
0003:TestSite.accept(0004)=000a from [192.168.8.116:7551] ALPN=
0004:TestSite.accept(0004)=000b from [192.168.8.116:7554] ALPN=
0005:TestSite.accept(0004)=000c from [192.168.8.116:7553] ALPN=
:TestSite.clireq[0007:]: GET /webrequest/ HTTP/1.1
:TestSite.clihdr[0007:]: Host: 192.168.8.93:8000
:TestSite.clihdr[0007:]: Connection: keep-alive
:TestSite.clihdr[0007:]: Cache-Control: max-age=0
:TestSite.clihdr[0007:]: Upgrade-Insecure-Requests: 1
:TestSite.clihdr[0007:]: User-Agent: Mozilla/5.0 (Windows NT 
10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 
Safari/537.36
:TestSite.clihdr[0007:]: Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
:TestSite.clihdr[0007:]: Accept-Encoding: gzip, deflate
:TestSite.clihdr[0007:]: Accept-Language: 
nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7
:TestSite.clihdr[0007:]: Cookie: 
ASP.NET_SessionId=2dl1g5h4pldnmzerzkl3pr1a; ExactServer=
[info] 114/212430 (14449) : # CORE 1
(table) table: 0x802399840 [
"Maxpipes": (number) 0
"SslBackendKeyRate": (number) 0
"Maxsock": (number) 4014
"Maxconn": (number) 2000
"Run_queue": (number) 1
"Pid": (number) 14449
"Ulimit-n": (number) 4014
"Tasks": (number) 12
"Nbthread": (number) 1
"ConnRateLimit": (number) 0
"PipesUsed": (number) 0
"MaxConnRate": (number) 6
"Process_num": (number) 1
"CumSslConns": (number) 0
"CumConns": (number) 6
"MaxSslConns": (number) 0
"Name": (string) "HAProxy"
"SslBackendMaxKeyRate": (number) 0
"SslCacheLookups": (number) 0
"Nbproc": (number) 1
"Release_date": (string) "2018/04/25"
"node": (string) "freebsd11"
"Idle_pct": (number) 100
"Version": (string) "1.9-dev0-cd235c-360"
"MaxSslRate": (number) 0
"SslRate": (number) 0
"CompressBpsO

Re: 1.9dev LUA core.tcp() cannot be used from different threads

2018-04-25 Thread PiBa-NL

Hi Christopher, Thierry,

Op 25-4-2018 om 11:30 schreef Christopher Faulet:
Oh, these tasks can be created before the threads creation... Ok, so 
maybe the right way to fix the bug is to registered these tasks 
without specific affinity and set it on the current thread the first 
time the tasks are woken up.


Here is an updated (and untested) patch. Pieter, could you check it 
please ?


Thierry, is there any way to create cosockets and applets from outside 
a lua's task and then manipulate them in the task's context ?



Thanks, works for me.
I've only tested the last patch from Christopher, and that seems to do 
the trick nicely for my situation.
If you guys come up with a different approach i'm happy to try that as 
well instead.


Regards,
PiBa-NL (Pieter)




Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-04-24 Thread PiBa-NL

Hi Tim,

Op 24-4-2018 om 14:36 schreef Tim Düsterhus:

Hi

Am 23.04.2018 um 22:36 schrieb PiBa-NL:

Is there a bug in my script, or is it more likely that 'something' needs
fixing in the lua api / interaction?

I poked around a bit: The cause in this case is the Content-Length
header. It causes that haproxy does not use chunked encoding for the output.

My suspicion is some kind of "race condition". It looks like that the
applet function does not get scheduled any more, once all data is sent
over the wire and thus the output to stdout is not printed in all cases.

I could not reproduce the issue if I added another `applet:send()` below
the second print_r. I also could not reproduce the issue if the
Content-Length header specifies a length *greater* than the actual
length of the content. I could however reproduce it, if the
Content-Length header specifies a length *smaller* than the actual
length of the content.

Best regards
Tim Düsterhus


Thanks for investigating, i did a little more of my own also :) and got 
some results (also a crash..).
As you say found i do string.len(response)+1 and then another send("1") 
in the end it suddenly runs fine.


Running from gdb directly allowed me to get a readable backtrace of the 
crash that seems to happen due to the print_r of the core.get_info() ..

Requesting the stats page from 1 and the coreinfo from 3 browserwindows.

When just requesting the core.get_info() and NOT printing it to 
core.Info() it runs fine for a long time.
Somehow this output method seems related to some serous issue at least 
when combined with large output from lua.. Perhaps there is a deeper 
issue?.?
Technically its probably not advisable to dump such large output's on 
the console.. But still to crash on that, i didn't expect it, and 
dumping info on console is the only way i can easily 'debug' the scripts 
interaction with haproxy..


Anyhow hoping it can be fixed or even maybe just some string can be 
truncated at its maximum buffer length.?.


Regards,

PiBa-NL (Pieter)

global
  nbthread 1
  lua-load /root/haproxytest/print_r.lua
  lua-load /root/haproxytest/smtpmailqueue/smtpmailqueue.lua
  lua-load /root/haproxytest/serverhealthchecker/serverhealthchecker.lua
  lua-load /root/haproxytest/serverhealth_smtpmail.lua

defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 60s

frontend TestSite
    bind *:80

    acl webrequest path -m beg /webrequest
    http-request use-service lua.testitweb-webrequest if webrequest

    stats enable
    stats admin if TRUE
    stats refresh 1s

    # prevent overloading yourself with loopback requests..
    acl isloopback src 127.0.0.0/8
    http-request deny if isloopback

    default_backend myservers

backend myservers
    server localSRVa 127.0.0.1:80 check
    server localSRVb 127.0.0.1:81 check inter 20s
    server localSRVc 127.0.0.1:82 check

# serverhealth_smtpmail.lua ##

local smtpmailer = Smtpmailqueue("luamailer",5)
smtpmailer:setserver("127.0.0.1","25")

local checknotifier = function(subject, message, serverstatecounters, 
allstates)
smtpmailer:addmail("haproxy@domain.local","itguy@domain.local","[srv-checker]"..subject, 
message.."\r\n"..serverstatecounters.."\r\n\r\n"..allstates)

end
local mychecker = Serverhealthchecker("hapchecker",3,2,checknotifier)

testitweb = {}
testitweb.webrequest = function(applet)
    if string.match(applet['path'],"/webrequest/mailstat") then
        return smtpmailer:webstats(applet)
    end

  if string.match(applet['path'],"/webrequest/coreinfo") then
    core.Info("# CORE 1")
    local cor = core.get_info()
    print_r(cor)
    core.Info("# CORE 1 ^")

    local resp = ""
    print_r(core.get_info(),false,function(x)
    resp=resp..string.gsub(x,"\n","")
    end
    )
    response = "CoreInfo:"..resp

    applet:add_header("Server", "haproxy/webstats")
    applet:add_header("Content-Length", string.len(response))
    applet:add_header("Content-Type", "text/html")
    applet:add_header("Refresh", "1")
    applet:start_response()
    applet:send(response)

    core.Info("# CORE 2")
    local cor = core.get_info()
    print_r(cor)
    core.Info("# CORE 2 ^")
  end
end
core.register_service("testitweb-webrequest", "http", testitweb.webrequest)


[info] 113/212806 (10702) : # CORE 1 ^
[info] 113/212806 (10702) : # CORE 2
(table) table: 0x8022d4900 [
    "Memmax_MB": (number) 0
    "Pid": (number) 10702
    "Uptime_sec": (number) 9
    "PipesUsed": (number) 0
 

1.9dev LUA core.tcp() cannot be used from different threads

2018-04-23 Thread PiBa-NL

Hi List, Thierry (LUA maintainer), Christopher (Multi-Threading),

When im making a tcp connection to a (mail) server from a lua task this 
error pops up randomly when using 'nbthread 4', the error luckily seems 
pretty self explanatory, but ill leave that to the threading and lua 
experts to come up with a fix ;) i think somehow the script or at least 
its socket commands must be forced to always be executed on the same 
thread? or perhaps there is another way..


Also i do wonder how far lua is safe to use at all in a multithreaded 
program. Or would that become impossible to keep safe.?. But thats a bit 
offtopic perhaps..


Line 240: recieve = mailer.receive(mailer, "*l")
[ALERT] 110/232212 (678) : Lua task: runtime error: 
/root/haproxytest/test.lua:240: connect: cannot use socket on other thread.


Line 266:  local mailer = core.tcp()
Line 267:    ret = mailer.connect(mailer, self.mailserver, 
self.mailserverport)
[ALERT] 110/232321 (682) : Lua task: runtime error: 
/root/haproxytest/test.lua:267: connect: cannot use socket on other thread.


Let me know if there is a patch or something else i can test/check. Or 
should configure differently.?.

Thanks in advance.

Regards,

PiBa-NL (Pieter)

 haproxy.conf & lua scripts
Basically the serverhealth_smtpmail_haproxy.conf 
<https://github.com/PiBa-NL/MyPublicProjects/blob/master/haproxy/lua-scripts/serverhealth_smtpmail_haproxy.conf> 
and the files it links to are here:

https://github.com/PiBa-NL/MyPublicProjects/tree/master/haproxy/lua-scripts

p.s.
The 'mailer' code if anyone is interested that was used is written in 
some 'libraries' ive committed on github link, maybe they are of use to 
someone else as well :) comments and fixes are welcome ;).. They are 
'first versions' but seem functional with limited testing sofar :).





1.9dev LUA register_task to function that ends performs a core dump..

2018-04-23 Thread PiBa-NL

Hi List, Thierry,

Below script makes haproxy perform a coredump when a function that 
doesnt loop forever is put into register_task.. is it possible to add 
some safety checks around such calls.?.


The coredump does not seem to contain any useful info when read by gdb.. 
unkown functions at unkown addresses...


Also i tried to register a new second task inside the c==5 check, but 
then it just seemed to hang..


Maybe not really important as people should probably never use a 
function that can exit for a task.. , but its never nice to have 
something perform a coredump..


Regards,

PiBa-NL (Pieter)

 haproxy.conf 

global
  nbthread 1
  lua-load /root/haproxytest/print_r.lua
  lua-load /root/haproxytest/test.lua

defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 60s

frontend TestSite
    bind *:80

 Lua script 

mytask = function()
    c = 0
    repeat
        core.Info("Task")
        core.sleep(1)
        c = c + 1
        if c == 3 then
            break
        end
    until false
    core.Info("Stopping task")
end
core.register_task(mytask)

 output ###

[info] 112/224221 (7881) : Task
[info] 112/224222 (7881) : Task
[info] 112/224223 (7881) : Task
[info] 112/224224 (7881) : Stopping task
Segmentation fault (core dumped)




1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-04-23 Thread PiBa-NL

Hi List, Thierry,

The second print_r(core.get_info()) only shows 'some' of its results and 
the final message never shows..
Is there some memory buffer overflow bug in there.? Possibly caused by 
the 'add_header' calls.. as removing those seems to fix the behaviour of 
the CORE2 print_r call..


Using haproxy 1.9dev, with config below on FreeBSD.

Is there a bug in my script, or is it more likely that 'something' needs 
fixing in the lua api / interaction?
Lemme know what i can to to help track this down somehow.. I tried 
memory 'poisoning' in haproxy but that doesn't seem to affect any 
effects i'm seeing..


Regards,

PiBa-NL (Pieter)


 Content of haproxy.conf 

global
  nbthread 1
  lua-load /root/haproxytest/print_r.lua
  lua-load /root/haproxytest/test.lua

defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 60s

frontend TestSite
    bind *:80

    acl webrequest path -m beg /webrequest
    http-request use-service lua.testitweb-webrequest if webrequest

 Content of test.lua 

testitweb = {}
testitweb.webrequest = function(applet)
        core.Info("# CORE 1")
        print_r(core.get_info())
        core.Info("# CORE 1 ^")

        local resp = ""
        print_r(core.get_info(),false,function(x)
            resp=resp..string.gsub(x,"\n","")
        end
        )
        response = "CoreInfo:"..resp

        applet:add_header("Server", "haproxy/webstats")
        applet:add_header("Content-Length", string.len(response))
        applet:add_header("Content-Type", "text/html")
        applet:add_header("Refresh", "10")
        applet:start_response()
        applet:send(response)

        core.Info("# CORE 2")
        print_r(core.get_info())
        core.Info("# CORE 2 ^")
    end
core.register_service("testitweb-webrequest", "http", testitweb.webrequest)


 (partial) output :

First CORE1 gets printed fully until the last item of the get_state in 
memory, CumConns in this case (thats gets assigned random though upon 
each start..)


    "Uptime_sec": (number) 3
    "Pid": (number) 7848
    "CumConns": (number) 3
]
[info] 112/222621 (7848) : # CORE 1 ^
[info] 112/222621 (7848) : # CORE 2
(table) table: 0x8023ff540 [
    "CurrSslConns": (number) 0
    "Version": (string) "1.9-dev0-564d15-357"
    "SslRate": (number) 0
    "PoolAlloc_MB": (number) 0
    "Hard_maxconn": (number) 2000
    "Nbthread": (number) 1
    "CurrConns": (number) 1
    "Memmax_MB": (number) 0
    "Maxsock": (number) 4011
    "ConnRateLimit": (number) 0
    "CompressBpsIn": (number) 0
    "Process_num": (number) 1
    "node": (string) "freebsd11"
    "Idle_pct": (number) 100
    "SessRate": (number) 1
    "CompressBpsRateLim": (number) 0
    "Tasks": (number) 4
    "Release_date": [info] 112/222622 (7848) : # CORE 1
(table) table: 0x8023ffc80 [
    "CurrSslConns": (number) 0
    "Version": (string) "1.9-dev0-564d15-357"
    "SslRate": (number) 0
    "PoolAlloc_MB": (number) 0
    "Hard_maxconn": (number) 2000
    "Nbthread": (number) 1
    "CurrConns": (number) 1
    "Memmax_MB": (number) 0
    "Maxsock": (number) 4011

As you can see the CORE2 is truncated and a new CORE1 continues printing 
after a new call to the webservice is made.. (there was time between it 
stopping output on screen and the next web call..)





Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-17 Thread PiBa-NL

Op 17-4-2018 om 17:46 schreef Willy Tarreau:

On Tue, Apr 17, 2018 at 04:33:07PM +0200, Olivier Houchard wrote:

After talking with Willy, here is an updated patch that does that.
That way, the day we'll want to use EV_ONESHOT, we'll be ready, and won't
miss any event.

Now merged, thanks guys!
Willy


Thanks!
Works as expected with current master.

Do i dare ask a estimate when 1.8.8 might be released? (oh just did. ;) )
O well, i guess ill run production with nokqueue for a little while, 
that works alright for me sofar.


Regards,
PiBa-NL (Pieter)




Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-16 Thread PiBa-NL

Hi Olivier,

Op 16-4-2018 om 17:09 schreef Olivier Houchard:

After some discussion with Willy, we came with a solution that may fix your
problem with kqueue.
Can you test the attached patch and let me know if it fixes it for you ?

Minor variation of the patch, that uses EV_RECEIPT if available, to avoid
scanning needlessly the kqueue.

Regards,

Olivier


Thanks the patch solves the issue i experienced at least for the 
testcase that i had. (And doesn't seem to cause obvious new issues that 
i could quickly spot..) Both with and without EV_RECEIPT on kev[0] it 
seems to work the same for my testcase..


Just a few thoughts though:
Now only the first kev[0] gets the EV_RECEIPT flag, shouldn't it be 
added to all items in the array? Now sometimes 3 changes are send and 
only 2 'results' are reported back. If i read right the EV_RECEIPT 
should 'force' a result for each change send. Also is there a reason you 
put it inside a '#ifdef' ? It seems to me a hard requirement to not read 
any possible pending events when sending the list of updated filters at 
that moment.?. Or perhaps its possible to call kevent only once? Both 
sending changes, and receiving new events in 1 big go and without the 
RECEIPT flag?


There are now more 'changes' send than required that need to be 
disregarded with a 'error flag' by kqueue
Doesn't that (slightly) affect performance? Or would checking a bitmask 
beforehand not be cheaper than what kevent itself needs to do to ignore 
an item and 'error report' some of the changes.?. I've not tried to 
measure this, but technically i think there will be a few more cpu 
operations needed overall this way.?.


Regards,

PiBa-NL (Pieter)




Re: [PATCH] BUG/MEDIUM: kqueue/poll: only use EV_SET when actually needing to add or remove event filters

2018-04-15 Thread PiBa-NL

Hi Willy,

Op 15-4-2018 om 23:08 schreef Willy Tarreau:

Well to be clear, I'm pretty sure we're hiding the dust under the carpet
here, even if it fixes the problem in your case. What I need to do is to
actually understand why we end up in this situation.


Okay added a little more code/error logging to help understand whats 
going on. Like below with the original code from master.
This error pops up with code '2': "[ENOENT] The event could not be found 
to be modified or deleted."
This prevents the "EVFILT_READ, EV_ADD (7)" from taking effect and 
reading the second browser request that includes the NTLM credentials..


Added logging:

    if (changes) {
        errno = -1;
        int x = kevent(kqueue_fd[tid], kev, changes, NULL, 0, NULL);
        int e = errno;
        fprintf(stdout, "    Events changed:%d result:%d err:%d\n", 
changes, x, e);

    }

The result is like this:

Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
Using kqueue() as the polling mechanism.
    EVFILT_READ, EV_ADD (4)
    EVFILT_READ, EV_ADD (5)
    Events changed:2 result:0 err:-1
KernelEvent kev for FD:(4) filter:-1
:Syner.accept(0004)=0007 from [192.168.8.116:4096] ALPN=
    EVFILT_READ, EV_ADD (7)
    Events changed:1 result:0 err:-1
KernelEvent kev for FD:(7) filter:-1
:Syner.clireq[0007:]: GET 
/SynEnt/docs/HRMResourceCard.aspx?ID=7 HTTP/1.1
:Syner.clihdr[0007:]: Accept: text/html, 
application/xhtml+xml, image/jxr, */*
:Syner.clihdr[0007:]: Accept-Language: 
nl-NL,nl;q=0.8,en-GB;q=0.5,en;q=0.3
:Syner.clihdr[0007:]: User-Agent: Mozilla/5.0 (Windows 
NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko

:Syner.clihdr[0007:]: Accept-Encoding: gzip, deflate
:Syner.clihdr[0007:]: Host: 192.168.8.93
:Syner.clihdr[0007:]: Connection: Keep-Alive
    EVFILT_READ, EV_DELETE (7)
    Events changed:1 result:0 err:-1
    EVFILT_WRITE, EV_ADD (8)
    Events changed:1 result:0 err:-1
KernelEvent kev for FD:(8) filter:-2
    EVFILT_READ, EV_ADD (8)
    EVFILT_WRITE, EV_DELETE (8)
    Events changed:2 result:0 err:-1
KernelEvent kev for FD:(8) filter:-1
KernelEvent kev for FD:(8) filter:-1
    EVFILT_READ, EV_DELETE (8)
    EVFILT_WRITE, EV_DELETE (8)
    Events changed:2 result:-1 err:2 <<<<<<<<<<< ERROR while deleting a 
non existing event

    EVFILT_READ, EV_ADD (8)
    Events changed:1 result:0 err:-1
KernelEvent kev for FD:(8) filter:-1
:Syner.srvrep[0007:0008]: HTTP/1.1 401 Unauthorized
:Syner.srvhdr[0007:0008]: Content-Type: text/html
:Syner.srvhdr[0007:0008]: Server: Microsoft-IIS/7.5
:Syner.srvhdr[0007:0008]: WWW-Authenticate: NTLM
:Syner.srvhdr[0007:0008]: WWW-Authenticate: Negotiate
:Syner.srvhdr[0007:0008]: X-Powered-By: ASP.NET
:Syner.srvhdr[0007:0008]: Date: Mon, 16 Apr 2018 00:12:47 GMT
:Syner.srvhdr[0007:0008]: Content-Length: 1332
    EVFILT_READ, EV_ADD (8)
    EVFILT_WRITE, EV_DELETE (8)
    EVFILT_READ, EV_ADD (7)
    Events changed:3 result:-1 err:2 <<<<<<<<<<< ERROR while deleting a 
non existing event


After this the KernelEvent for FD 7 that should read the second browser 
request never happens.

I think we can conclude deleting events that don't exist is a bad thing.?
Ill leave further discussion about why and how to you and Oliver :).

Regards,
PiBa-NL (Pieter)




[PATCH] BUG/MEDIUM: kqueue/poll: only use EV_SET when actually needing to add or remove event filters

2018-04-15 Thread PiBa-NL

Hi Willy,

Sending a patch proposal after like 40 hours of looking through what 
happens and what event we might be missing, im now changing +-40 lines 
of code.. And actually not 'really' changing the events requested.. But 
rather kinda bringing the old 'previous state' comparison back..


Lemme know if its okay like this, you would like the new variable to be 
renamed, the if/then/else restructured or just don't like it at all ;).

In the last case please do give a hint about what you would like instead :).

Regards,
PiBa-NL (Pieter)
From 21c191c036f740eb75a4fa59c23232b910cd695c Mon Sep 17 00:00:00 2001
From: PiBa-NL <piba.nl@gmail.com>
Date: Sun, 15 Apr 2018 22:20:22 +0200
Subject: [PATCH] BUG/MEDIUM: kqueue/poll: only use EV_SET when actually
 needing to add or remove event filters

Avoid event filters being added twice or deleted while not present causing 
hanging requests.
Known reproduction:
With haproxy on a FreeBSD machine, and a IIS website with NTLM authentication 
and using http-tunnel haproxy would fail to receive the request with the added 
credentials to pass to the backend.
---
 include/types/fd.h |  1 +
 src/ev_kqueue.c| 36 +---
 src/fd.c   |  1 +
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/include/types/fd.h b/include/types/fd.h
index 0902e7f..243d7bb 100644
--- a/include/types/fd.h
+++ b/include/types/fd.h
@@ -115,6 +115,7 @@ struct fdtab {
__decl_hathreads(HA_SPINLOCK_T lock);
unsigned long thread_mask;   /* mask of thread IDs authorized 
to process the task */
unsigned long polled_mask;   /* mask of thread IDs currently 
polling this fd */
+   unsigned long polled_mask_write; /* mask of thread IDs currently 
polling this fd */
unsigned long update_mask;   /* mask of thread IDs having an 
update for fd */
struct fdlist_entry cache;   /* Entry in the fdcache */
void (*iocb)(int fd);/* I/O handler */
diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece..ecadec3 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -56,29 +56,43 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
en = fdtab[fd].state;
 
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & 
FD_EV_POLLED_RW)) {
-   if (!(fdtab[fd].polled_mask & tid_bit)) {
+   if (!(fdtab[fd].polled_mask & tid_bit) && 
!(fdtab[fd].polled_mask_write & tid_bit)) {
/* fd was not watched, it's still not */
continue;
}
/* fd totally removed from poll list */
-   EV_SET([changes++], fd, EVFILT_READ, EV_DELETE, 0, 
0, NULL);
-   EV_SET([changes++], fd, EVFILT_WRITE, EV_DELETE, 0, 
0, NULL);
-   HA_ATOMIC_AND([fd].polled_mask, ~tid_bit);
+   if (fdtab[fd].polled_mask & tid_bit) {
+   EV_SET([changes++], fd, EVFILT_READ, 
EV_DELETE, 0, 0, NULL);
+   HA_ATOMIC_AND([fd].polled_mask, ~tid_bit);
+   }
+   if (fdtab[fd].polled_mask_write & tid_bit) {
+   EV_SET([changes++], fd, EVFILT_WRITE, 
EV_DELETE, 0, 0, NULL);
+   HA_ATOMIC_AND([fd].polled_mask_write, 
~tid_bit);
+   }
}
else {
/* OK fd has to be monitored, it was either added or 
changed */
 
-   if (en & FD_EV_POLLED_R)
-   EV_SET([changes++], fd, EVFILT_READ, 
EV_ADD, 0, 0, NULL);
-   else if (fdtab[fd].polled_mask & tid_bit)
+   if (en & FD_EV_POLLED_R) {
+   if (!(fdtab[fd].polled_mask & tid_bit)) {
+   EV_SET([changes++], fd, 
EVFILT_READ, EV_ADD, 0, 0, NULL);
+   HA_ATOMIC_OR([fd].polled_mask, 
tid_bit);
+   }
+   } else if (fdtab[fd].polled_mask & tid_bit) {
EV_SET([changes++], fd, EVFILT_READ, 
EV_DELETE, 0, 0, NULL);
+   HA_ATOMIC_AND([fd].polled_mask, ~tid_bit);
+   }
 
-   if (en & FD_EV_POLLED_W)
-   EV_SET([changes++], fd, EVFILT_WRITE, 
EV_ADD, 0, 0, NULL);
-   else if (fdtab[fd].polled_mask & tid_bit)
+   if (en & FD_EV_POLLED_W) {
+   if (!(fdtab[fd].polled_mask_write & tid_bit)) {
+   EV_SET([changes++], fd, 
EVFILT_WRITE, EV_ADD, 0, 0, NULL);
+  

Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread PiBa-NL

Hi Willy,

And a second mail as i just thought of one extra thing you wrote that 
maybe i misunderstand or perhaps confused you with a small remark about 
cpu usage in my earlier mail (that was a side effect of my other earlier 
but totally wrong code change..).

I'm suspecting we could have something wrong with the polled_mask, maybe
sometimes it's removed too early somewhere, preventing the delete(write)
from being performed, which would explain why it loops.

To clarify the issue is not that haproxy uses cpu by looping, the issue 
is that haproxy prevents the page from loading in the browser. The 'fix' 
on the old version after the commit introducing the issue was to call 
the EV_SET write delete *less* often. Or maybe my understanding of what 
is does is just wrong :).


Op 13-4-2018 om 0:57 schreef PiBa-NL:

Hi Willy,

Op 13-4-2018 om 0:22 schreef Willy Tarreau:

I'm suspecting we could have something wrong with the polled_mask, maybe
sometimes it's removed too early somewhere, preventing the delete(write)
from being performed, which would explain why it loops.

By the way you must really not try to debug an
old version but stick to the latest fixes.
Okay testing from now on with current master, just thought it would be 
easier to backtrack if i knew what particular new/missing event would 
possibly cause it. And it could have been simpler to find a fix just 
after the problem was introduced, but it seems it ain't that simple :).


I'm seeing two things that could be of interest to test :
   - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
 to delete the events. It will slightly inflate the list of events
 but not that much. If it fixes the problem it means that the
 polled_mask is sometimes wrong. Please do that with the updated
 master.
Removing the 'if polled_mask' does not fix the issue, in fact that 
makes it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is 
also not shown anymore without those checks..


   - switch to poll() just to see if you have the same so that we can
 figure if only the kqueue code triggers the issue. poll() doesn't
 rely on polled_mask at all.

Using poll (startup with -dk) the request works properly.


Many thanks for your tests.
Willy


Regards,

PiBa-NL (Pieter)



Regards,

PiBa-NL (Pieter)




Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread PiBa-NL

Hi Willy,

Op 13-4-2018 om 0:22 schreef Willy Tarreau:

By the way you must really not try to debug an
old version but stick to the latest fixes.
Okay testing from now on with current master, just thought it would be 
easier to backtrack if i knew what particular new/missing event would 
possibly cause it. And it could have been simpler to find a fix just 
after the problem was introduced, but it seems it ain't that simple :).


I'm seeing two things that could be of interest to test :
   - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
 to delete the events. It will slightly inflate the list of events
 but not that much. If it fixes the problem it means that the
 polled_mask is sometimes wrong. Please do that with the updated
 master.
Removing the 'if polled_mask' does not fix the issue, in fact that makes 
it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is also not 
shown anymore without those checks..


   - switch to poll() just to see if you have the same so that we can
 figure if only the kqueue code triggers the issue. poll() doesn't
 rely on polled_mask at all.

Using poll (startup with -dk) the request works properly.


Many thanks for your tests.
Willy


Regards,

PiBa-NL (Pieter)




Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread PiBa-NL

Hi Willy,

Op 12-4-2018 om 1:19 schreef Willy Tarreau:

Thank you very much for pointing the exact line that causes you trouble.
Well exact line.. probably not the right one. And yes just removing that 
line indeed breaks something else. (as expected..)

Would you have the ability to try the latest 1.9-dev just by chance ?
Yes when i started compiling from source, thats where i started, the 
current master branch. Though (sadly) it has the same broken effect.


Ill try and investigate a bit more.. Though if you think of a possible 
patch, i'm happy to try anything out ;)


Regards,
PiBa-NL (Pieter)



Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-11 Thread PiBa-NL

Hi List / Willy,

Removing the line below 'fixes' my issue with kqueue poller and NTLM 
authentication with option http-tunnel..
Though i'm sure something is then broken then horribly also (CPU go's 
100%..). And i'm not sure what the proper fix would be. (ive got to 
little knowledge of what the various flags do and C++ aint a language i 
normally ever look at.. )
The 'breaking' commit was this one: 
http://git.haproxy.org/?p=haproxy-1.8.git;a=commit;h=f839593dd26ec210ba66d74b2a4c2040dd1ed806


Can you take a new look at that piece of code? (as the commit was yours ;) )
Thanks in advance :).

Regards,
PiBa-NL (Pieter)

 src/ev_kqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index a103ece..49e7302 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -78,7 +78,7 @@ REGPRM2 static void _do_poll(struct poller *p, int exp)
         else if (fdtab[fd].polled_mask & tid_bit)
             EV_SET([changes++], fd, EVFILT_WRITE, EV_DELETE, 
0, 0, NULL);


-            HA_ATOMIC_OR([fd].polled_mask, tid_bit);
+//            HA_ATOMIC_OR([fd].polled_mask, tid_bit);
     }
 }
 if (changes)



Op 10-4-2018 om 23:11 schreef PiBa-NL:

Hi Haproxy List,

I upgraded to 1.8.7 (coming from 1.8.3) and found i could no-longer 
use one of our IIS websites. The login procedure thats using windows 
authentication / ntlm seems to fail..
Removing option http-tunnel seems to fix this though. Afaik 
http-tunnel 'should' switch to tunnelmode after the first request and 
as such should have no issue sending the credentials the the server.?.


Below are:  config / haproxy -vv / tcpdump / sess all

Is it a known issue? Is there anything else i can provide?

Regards,

PiBa-NL (Pieter)

-
# Automaticaly generated, dont edit manually.
# Generated on: 2018-04-10 21:00
global
    maxconn            1000
    log            192.168.8.10    local1    info
    stats socket /tmp/haproxy.socket level admin
    gid            80
    nbproc            1
    nbthread            1
    hard-stop-after        15m
    chroot                /tmp/haproxy_chroot
    daemon
    tune.ssl.default-dh-param    2048
    defaults
    option log-health-checks


frontend site.domain.nl2
    bind            192.168.8.5:443 name 192.168.8.5:443  ssl  crt 
/var/etc/haproxy/site.domain.nl2.pem crt-list 
/var/etc/haproxy/site.domain.nl2.crt_list

    mode            http
    log            global
    option            httplog
    option            http-tunnel
    maxconn            100
    timeout client        1h
    option tcplog
    default_backend website-intern_http_ipvANY

backend site-intern_http_ipvANY
    mode            http
    log            global
    option            http-tunnel
    timeout connect        10s
    timeout server        1h
    retries            3
    server            site 192.168.13.44:443 ssl  weight 1.1 verify none

-
[2.4.3-RELEASE][root@pfsense_5.local]/root: haproxy -vv
HA-Proxy version 1.8.7 2018/04/07
Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -O2 -pipe -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-fno-strict-overflow -Wno-address-of-packed-member 
-Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
Running on OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
-
tcpdump of : Client 8.32>Haproxy 8.5:

21:09:13.452118 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [S], 
seq 1417754656, win 8192, options [mss 1260,nop,wscale 
8,nop,

1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-10 Thread PiBa-NL

Hi Haproxy List,

I upgraded to 1.8.7 (coming from 1.8.3) and found i could no-longer use 
one of our IIS websites. The login procedure thats using windows 
authentication / ntlm seems to fail..
Removing option http-tunnel seems to fix this though. Afaik http-tunnel 
'should' switch to tunnelmode after the first request and as such should 
have no issue sending the credentials the the server.?.


Below are:  config / haproxy -vv / tcpdump / sess all

Is it a known issue? Is there anything else i can provide?

Regards,

PiBa-NL (Pieter)

-
# Automaticaly generated, dont edit manually.
# Generated on: 2018-04-10 21:00
global
    maxconn            1000
    log            192.168.8.10    local1    info
    stats socket /tmp/haproxy.socket level admin
    gid            80
    nbproc            1
    nbthread            1
    hard-stop-after        15m
    chroot                /tmp/haproxy_chroot
    daemon
    tune.ssl.default-dh-param    2048
    defaults
    option log-health-checks


frontend site.domain.nl2
    bind            192.168.8.5:443 name 192.168.8.5:443  ssl  crt 
/var/etc/haproxy/site.domain.nl2.pem crt-list 
/var/etc/haproxy/site.domain.nl2.crt_list

    mode            http
    log            global
    option            httplog
    option            http-tunnel
    maxconn            100
    timeout client        1h
    option tcplog
    default_backend website-intern_http_ipvANY

backend site-intern_http_ipvANY
    mode            http
    log            global
    option            http-tunnel
    timeout connect        10s
    timeout server        1h
    retries            3
    server            site 192.168.13.44:443 ssl  weight 1.1 verify none

-
[2.4.3-RELEASE][root@pfsense_5.local]/root: haproxy -vv
HA-Proxy version 1.8.7 2018/04/07
Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -O2 -pipe -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-fno-strict-overflow -Wno-address-of-packed-member -Wno-null-dereference 
-Wno-unused-label -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
Running on OpenSSL version : OpenSSL 1.0.2m-freebsd  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
-
tcpdump of : Client 8.32>Haproxy 8.5:

21:09:13.452118 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [S], seq 
1417754656, win 8192, options [mss 1260,nop,wscale 8,nop,nop,sackOK], 
length 0
21:09:13.452312 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [S.], seq 
1950703403, ack 1417754657, win 65228, options [mss 1260,nop,wscale 
7,sackOK,eol], length 0
21:09:13.453030 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.], ack 
1, win 260, length 0
21:09:13.457740 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.], seq 
1:190, ack 1, win 260, length 189
21:09:13.457762 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack 
190, win 510, length 0
21:09:13.459503 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], seq 
1:1261, ack 190, win 511, length 1260
21:09:13.459516 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], seq 
1261:2521, ack 190, win 511, length 1260
21:09:13.459527 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P.], seq 
2521:2686, ack 190, win 511, length 165
21:09:13.460342 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [.], ack 
2686, win 260, length 0
21:09:13.478984 IP 192.168.8.32.51658 > 192.168.8.5.443: Flags [P.], seq 
190:316, ack 2686, win 260, length 126
21:09:13.479038 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [.], ack 
316, win 510, length 0
21:09:13.480105 IP 192.168.8.5.443 > 192.168.8.32.51658: Flags [P

Re: 答复: proxy error 502

2018-04-02 Thread PiBa-NL

Hi Ricky,

Probably found the anomaly in the filename header : vs = the header 
should probably be "filename: a.pdf" instead of "filename= a.pdf"


[2.4.4-DEVELOPMENT][root@pfSe.localdomain]/root: echo "show errors" | 
socat stdio /var/lib/haproxy/stats | head -n 30

Total events captured on [02/Apr/2018:14:42:10.603] : 2

[02/Apr/2018:14:40:57.817] backend app (#3): invalid response
  frontend main (#2), server app1 (#1), event #1
  src 192.168.0.40:49068, session #9, session flags 0x04ce
  HTTP msg state MSG_HDR_NAME(17), msg flags 0x, tx flags 
0x28603000

  HTTP chunk len 0 bytes, HTTP body len 0 bytes
  buffer flags 0x80008002, out 0 bytes, total 15360 bytes
  pending 15360 bytes, wrapping at 16384, error at position 155:

  0  HTTP/1.1 200 OK\r\n
  00017  Server: nginx\r\n
  00032  Date: Mon, 02 Apr 2018 12:40:57 GMT\r\n
  00069  Content-Type: application/pdf\r\n
  00100  Transfer-Encoding: chunked\r\n
  00128  Connection: close\r\n
  00147 filename=a.pdf: \r\n
  00165  Strict-Transport-Security: max-age=31536000\r\n
  00210  X-Content-Type-Options: nosniff\r\n
  00243  \r\n
  00245  1fb8\r\n
  00251  %PDF-1.5\r\n


Op 2-4-2018 om 5:22 schreef Xu Ricky:


Thank you for your reply,    sorry ,I am not english.Probably not very 
accurate.


http://exemple.com/pdf.php

·error 502I've been trying this for weeks.

1.But you can directly access nginx.

2.Haproxy+apache+php  is ok.

3.http://exemple.com/a.pdf <http://exemple.com/a.pdf>is ok.

4.Haproxy tcp proxy is ok.

5.

/[root@t08 haproxy-1.8.5]# haproxy -f /etc/haproxy/haproxy.cfg -d///

/Available polling systems :/

/  epoll : pref=300,  test result OK/

/   poll : pref=200,  test result OK/

/ select : pref=150,  test result FAILED/

/Total: 3 (2 usable), will use epoll./

//

/Available filters :/

/   [SPOE] spoe/

/   [COMP] compression/

/   [TRACE] trace/

/Using epoll() as the polling mechanism./

/:main.accept(0005)=0009 from [192.168.241.40:2787] ALPN=/

/0001:main.accept(0005)=000a from [192.168.241.40:2788] ALPN=/

/:main.clireq[0009:]: GET ///pdf.phpHTTP/1.1/

/:main.clihdr[0009:]: Host: 192.168.241.18/

/:main.clihdr[0009:]: Connection: keep-alive/

/:main.clihdr[0009:]: Cache-Control: max-age=0/

/:main.clihdr[0009:]: Upgrade-Insecure-Requests: 1/

/:main.clihdr[0009:]: User-Agent: Mozilla/5.0 (Windows 
NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/64.0.3282.140 Safari/537.36/


/:main.clihdr[0009:]: Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8/


/:main.clihdr[0009:]: Accept-Encoding: gzip, deflate/

/:main.clihdr[0009:]: Accept-Language: zh-CN,zh;q=0.9/

/:app.srvcls[0009:adfd]/

/:app.clicls[0009:adfd]/

/:app.closed[0009:adfd]/

/0001:main.clicls[000a:]/

/0001:main.closed[000a:]/

*发件人:*PiBa-NL [mailto:piba.nl@gmail.com]
*发送时间:*2018年3月31日1:20
*收件人:*Xu Ricky <xu.binf...@live.com>; haproxy@formilux.org
*主题:*Re: proxy error 502

Hi Ricky,

Works for me with your configuration, mostly.
Adding a bind to the frontend and using haproxy 1.8.3 (it doesn't 
allow the implicit bind on the frontend line itself..).

Also added the fastcgi config and a empty mimetype file..

[2.4.3-RELEASE][root@pfSe.localdomain 
<mailto:root@pfSe.localdomain>]/root: haproxy -f 
/root/XuRicky/haproxy.cfg -d

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result FAILED
Total: 3 (2 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
Using kqueue() as the polling mechanism.
:main.accept(0005)=0009 from [192.168.0.40:56777] ALPN=
0001:main.accept(0005)=000a from [192.168.0.40:56778] ALPN=
0002:main.accept(0005)=000b from [192.168.0.40:56779] ALPN=
:main.clireq[0009:]: GET /a.pdf HTTP/1.1
:main.clihdr[0009:]: Host: 192.168.0.133
:main.clihdr[0009:]: Connection: keep-alive
:main.clihdr[0009:]: Upgrade-Insecure-Requests: 1
:main.clihdr[0009:]: User-Agent: Mozilla/5.0 (Windows 
NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like G ecko) 
Chrome/65.0.3325.181 Safari/537.36
:main.clihdr[0009:]: Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/* 
;q=0.8

:main.clihdr[0009:]: Accept-Encoding: gzip, deflate
:main.clihdr[0009:]: Accept-Language: 
nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7

:app.srvrep[0009:000c]: HTTP/1.1 200 OK
:app.srvhdr[0009:000c]: Server: nginx/1.12.2
:app.srvhdr[0009:000c]: Date: Fri, 30 Mar 2018 12:52:26 GMT
:app.srvhdr[0009:000c]:

Re: proxy error 502

2018-03-30 Thread PiBa-NL

Hi Ricky,

Works for me with your configuration, mostly.
Adding a bind to the frontend and using haproxy 1.8.3 (it doesn't allow 
the implicit bind on the frontend line itself..).

Also added the fastcgi config and a empty mimetype file..

[2.4.3-RELEASE][root@pfSe.localdomain]/root: haproxy -f 
/root/XuRicky/haproxy.cfg -d

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result FAILED
Total: 3 (2 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
Using kqueue() as the polling mechanism.
:main.accept(0005)=0009 from [192.168.0.40:56777] ALPN=
0001:main.accept(0005)=000a from [192.168.0.40:56778] ALPN=
0002:main.accept(0005)=000b from [192.168.0.40:56779] ALPN=
:main.clireq[0009:]: GET /a.pdf HTTP/1.1
:main.clihdr[0009:]: Host: 192.168.0.133
:main.clihdr[0009:]: Connection: keep-alive
:main.clihdr[0009:]: Upgrade-Insecure-Requests: 1
:main.clihdr[0009:]: User-Agent: Mozilla/5.0 (Windows NT 
10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 
G ecko) 
Chrome/65.0.3325.181 Safari/537.36
:main.clihdr[0009:]: Accept: 
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/* 
;q=0.8

:main.clihdr[0009:]: Accept-Encoding: gzip, deflate
:main.clihdr[0009:]: Accept-Language: 
nl-NL,nl;q=0.9,en-US;q=0.8,en;q=0.7

:app.srvrep[0009:000c]: HTTP/1.1 200 OK
:app.srvhdr[0009:000c]: Server: nginx/1.12.2
:app.srvhdr[0009:000c]: Date: Fri, 30 Mar 2018 12:52:26 GMT
:app.srvhdr[0009:000c]: Content-Type: application/octet-stream
:app.srvhdr[0009:000c]: Content-Length: 1625682
:app.srvhdr[0009:000c]: Last-Modified: Thu, 29 Mar 2018 23:20:07 GMT
:app.srvhdr[0009:000c]: Connection: close
:app.srvhdr[0009:000c]: ETag: "5abd74a7-18ce52"
:app.srvhdr[0009:000c]: Accept-Ranges: bytes
:app.srvcls[0009:adfd]
0001:main.clicls[000a:]
0001:main.closed[000a:]
0002:main.clicls[000b:]
0002:main.closed[000b:]
0003:main.clicls[0009:]
0003:main.closed[0009:]

Regards,
PiBa-NL (Pieter)

Op 30-3-2018 om 4:47 schreef Xu Ricky:


Hi:

File.zip (nginx haproxy php)

·I can't solve it ,please help me!thank you!

#error

502 Bad Gateway The server returned an invalid or incomplete response.

https://static.oschina.net/uploads/space/2018/0326/214155_DKqA_1441082.png





Re: Logging check response

2018-03-20 Thread PiBa-NL

Hi Andreas,
Op 20-3-2018 om 12:21 schreef Andreas Mock:

Hi all,

I don't get the http-check (ssl) up and running.

Is there a way to log the content returned by a check run
so that I can get a hint on the problem?

haproxy 1.8

Best regards
Andreas



I'm not sure if logging of the content is possible easily..

But if you look at the status code on the stats page, and then take also 
into account that http-check does by default not send SNI nor a Host 
header. You should be able to mimic the requested check url with a CURL 
request. And can likely guess what is going on..


Hope it helps..

Regards,

PiBa-NL (Pieter)




Re: Order of acls not important?

2018-03-15 Thread PiBa-NL

Hi,

Op 15-3-2018 om 21:24 schreef Stefan Husch|qutic development:

I thought the acls are processed from 1 to 3,

Acl's are evaluated where they are used.


What am I doing wrong? Is the acl-position in a haproxy-config not important?

Thx, Stefan


The order of the acl's themselves is not relevant.

However you should iirc get a warning that the http-request will be 
processed before the use_backend directive.


Regards,

PiBa-NL




Re: logger #n in ALERT messages?

2018-03-09 Thread PiBa-NL

Hi Glen,
Op 9-3-2018 om 17:43 schreef Glen Gunselman:


(Note: this is my first attempt to setup haproxy, I'm using Oracle 
Linux 6.9 and HA-Proxy version 1.5.18 2016/05/10)


How do I relate "logger #n" in ALERT messages to the configuration 
statements?


Related details:

From starting haproxy using:

sudohaproxy -d -f /etc/haproxy/haproxy.cfg | grep logger

I get ALERTs of the following format:

[ALERT] 067/101928 (39878) : sendto logger #2 failed: Resource 
temporarily unavailable (errno=11)


[ALERT] 067/101928 (39878) : sendto logger #1 failed: No such file or 
directory (errno=2)


[ALERT] 067/101928 (39878) : sendto logger #2 failed: No such file or 
directory (errno=2)


[ALERT] 067/101928 (39878) : sendto logger #1 failed: No such file or 
directory (errno=2)


…

From top of /etc/haproxy/haproxy.cfg

global

log /dev/loglocal0

log /dev/loglocal1 notice

chroot /var/lib/haproxy


You do have both log sockets?:

/dev/log

and

/var/lib/haproxy/dev/log


user haproxy

group haproxy

daemon

maxconn 1

stats socket /var/run/haproxy/haproxy.sock mode 0600 level admin

# Default SSL material locations

ca-base /etc/ssl/certs

crt-base /etc/ssl/private

tune.ssl.default-dh-param 2048

ssl-default-bind-ciphers 
kEECDH+aRSA+AES:kRSA+AES:+AES256:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL


ssl-default-bind-options no-sslv3

defaults

logglobal

modehttp

optionhttplog

optiondontlognull

optionforwardfor

timeout connect 5000

timeout client 300s

timeout server 300s

listenstats:1936

modehttp

logglobal

maxconn 10

timeout client100s

timeout server100s

timeout connect 100s

timeout queue100s

stats enable

stats hide-version

stats refresh 30s

stats show-node

stats auth :

stats uri/haproxy?stats

(Note: I did not include the frontend, acl, use_backend and backend 
sections.There are no log statements in these sections.)


I did add the following to /etc/rsyslog.conf and messages are being 
logged to those files.


local0.*/var/log/haproxy.log

local1.*/var/log/haproxy-status.log

Thanks for any clues,

Glen


Regards,

PiBa-NL (Pieter)



Re: Dynamically adding/deleting SSL certificates

2018-03-05 Thread PiBa-NL

Hi,
Op 5-3-2018 om 19:25 schreef Willy Tarreau:

Hello Aurélien,

On Mon, Mar 05, 2018 at 03:34:11PM +0100, Aurélien Nephtali wrote:

Hello,

I'm working on a feature to add or delete SSL certificates without
reloading HAProxy and I'm facing a problem regarding the way to feed
the new certificates to the admin socket.

The certificates contain \n so the parser will trip on them and
incorrectly process the command.

Those are my ideas so far:

 - base64 the certificate content,
 - add a binary protocol to the socket to handle this special case
(intrusive, not the best idea),
 - add support for quotes.

(some months ago there was also an idea in
https://www.mail-archive.com/haproxy@formilux.org/msg23857.html)

What would be the best/upstreamable way to do ?

I tend to think (first idea out of my head) that for such file types,
we could very well consider that the command reads multiple lines and
stops at the first empty line. That's very convenient to use in scripts
and even by hand in copy-paste sessions. It would work with almost all
of the data types we have to feed via the CLI, including the maps/acls.

And a script writing there would just have to run grep -v "^$" to be
save, which is pretty easy.

In fact that's already the format used for the output : the output of
each command is defined as running till the first empty line.

I also thought about escaping end of lines with a backslash but that
becomes very painful to place in scripts.

Just my two cents, I'm also interested in people's ideas regarding this.

Thanks,
Willy

I would think the ocsp updates already does something similar with 
base64. That would be usable for other binary files as well.?. Though i 
guess .pem is kinda readable already and not a binary file.. Unless 
perhaps support for pfx files would get added some day.?. afaik those 
are in binary format..


root@server:/etc/haproxy# echo "set ssl ocsp-response $(/usr/bin/base64 
-w 1 /etc/haproxy/star_mydomain_com.crt.ocsp)" | socat stdio 
unix-connect:/run/haproxy/admin.sock

OCSP Response updated!

Not that i have a strong preference, but imho it would be nice to keep 
the way to call similar commands the the same.


Regards,

PiBa-NL (Pieter)





Re: problem in 1.8 with hosts going out of service

2018-01-24 Thread PiBa-NL

Hi Christopher,

Patch seems to work for me.
Maybe Paul can confirm as well.

Regards,
PiBa-NL / Pieter

Op 24-1-2018 om 22:02 schreef Christopher Faulet:

Le 24/01/2018 à 17:21, Paul Lockaby a écrit :
Sorry, I know this list is super busy and that there are a number of 
other more important issues that need to be worked through but I'm 
hoping one of the maintainers has been able to confirm this bug?




Hi,

Sorry Paul. As you said, we are pretty busy. And you're right to ping 
us. So, I can confirm the bug. It is a bug on threads, a deadlock, 
because of a typo.


Could you check the attached patch to confirm it fixes your problem ?

Thanks,






Re: problem in 1.8 with hosts going out of service (alive.txt+404+track)

2018-01-24 Thread PiBa-NL

Hi Paul, List,

Op 24-1-2018 om 17:21 schreef Paul Lockaby:

Sorry, I know this list is super busy and that there are a number of other more 
important issues that need to be worked through but I'm hoping one of the 
maintainers has been able to confirm this bug?
I can reproduce it indeed with your config on 1.8.3, cpu usage go's 100% 
and stats stops responding when the alive.txt is removed.
When a backend go's down completely the track works as intended though 
(stats shows both server marked down).
Hope this and below added info / smaller config helps someone track the 
issue down further in the code.


Thanks,
-Paul


On Jan 17, 2018, at 10:27 AM, Paul Lockaby <plock...@uw.edu> wrote:

Ok I've tracked this problem down specifically to the usage of check tracking.

That is to say, the backend "example-api" is set to track the backend 
"example-http". When that tracking is enabled and one of the servers in the backend goes 
down then all of haproxy goes down and never recovers.

So this works:
server myhost myhost.example.com:8445 ssl ca-file 
/usr/local/ssl/certs/cacerts.cert

But this does not:
server myhost myhost.example.com:8445 track example-http/myhost ssl ca-file 
/usr/local/ssl/certs/cacerts.cert

This is definitely a regression from 1.7 because I used this feature in 1.7 
without issue.



It seems a combination of 3 combined features to trigger this issue:
    option httpchk GET /alive.txt + http-check disable-on-404 + track

Backtrace keeps showing this:
(gdb) bt
#0  0x0046b59f in srv_set_stopping ()
#1  0x004a3057 in ?? ()
#2  0x004f0eaf in process_runnable_tasks ()
#3  0x004aa13c in ?? ()
#4  0x004a9a16 in main ()

I could reduce the config to this:

frontend stats-frontend
    bind *:2999
    mode http
    log global
    stats enable
    stats uri /haproxy

frontend stats-frontend
    bind *:2999
    mode http
    log global
    stats enable
    stats uri /haproxy

frontend secured
    bind *:8080
    mode http
    acl request_api hdr_beg(Host) -i api.
    use_backend example-api if request_api
    default_backend example-http

backend example-http
    mode http
    option httpchk GET /haproxy/alive.txt
    http-check disable-on-404
    server myhost vhost1.pfs.local:302 check

backend example-api
    mode http
    option httpchk GET /haproxy/alive.txt
    http-check disable-on-404
    server myhost vhost1.pfs.local:303 track example-http/myhost

Regards,

PiBa-NL / Pieter




Re: bug: mworker unable to reload on USR2 since baf6ea4b

2017-12-29 Thread PiBa-NL

Hi William, Willy,

Op 29-12-2017 om 11:35 schreef William Lallemand:

On Fri, Dec 29, 2017 at 10:46:40AM +0100, Willy Tarreau wrote:

Hi William,

In fact it still doesn't fclose() the streams, which worries me a little bit
for the long term, because eventhough any printf() will end up being written
into /dev/null, it's still preferable to mark the FILE* as being closed and
only then reopen 0,1,2 to ensure they cannot be abusively reused. It will
also remove some confusion in strace by avoiding seeing some spurious fcntl()
or write(2, foo, strlen(foo)) being sent there caused by alerts for example.
[...]
Otherwise I'm reasonably confident that this should be enough to close
all pending issues related to the master-worker now.

Willy


I agree, it's better to merge them with fclose()


But shouldn't be needed, as i read dup2 will close them?

"int dup2(int oldfd, int newfd);" "closing/newfd/first if necessary" https://linux.die.net/man/2/dup2 Regards, 
PiBa-NL / Pieter




Re: bug: mworker unable to reload on USR2 since baf6ea4b

2017-12-25 Thread PiBa-NL

Hi Lucas, William,

I've made a patch which 'i think' fixes the issue with fclose called 'to 
often?'.

Can you guys verify?

I hope it helps.. but then again, maybe not the right way .?.
(I really have to little experience with these kind of things..)


Op 25-12-2017 om 10:25 schreef Willy Tarreau:

On Sun, Dec 24, 2017 at 07:36:54PM +0100, Willy Tarreau wrote:

The bug has been introduced in commit baf6ea4b (" BUG/MINOR: mworker:
detach from tty when in daemon mode") and affects 1.8.1 and newer
stable releases.
The discourse-OP also states that the socket is likely closed by the
triple flcose() introduced in that bisected commit.

Ah bad :-( But interestingly, we were speaking about using CLOEXEC on
listeners, I think this will make all of this much easier to deal with.

I guess we
first want to clearly describe how we want each process to behave in
each case (mw+debug, mw+quiet, mw+daemon, debug, quiet, daemon). This
way we'll avoid pushing fixes which break other use cases ;-)

Regards,
Willy



And then below some 'ramblings' that also might not all make sense... :)

I've made the assumption that when running in daemon mode there should 
never be any output to the stdin/out/err files when the masterworker is 
re-executing itself, and those 'files' are already closed as the master 
process doesn't really change.. or does it.?.
Is this correct, or would it just introduce another bug.?. I'm not 
really sure how to properly check for used the fd's in child processes 
and what they are used for...


As for writing out what is expected for each case. I think this would be 
it, including current results..:


The short version would be like this:
     - warnings+errors
    debug - pollerinfo+warnings+errors+connections
    quiet - no output except errors
    daemon - no output after first startup
    master-worker - the same information as 'normal'? (also after a 
reload, including connections info??)



Below are the above options in various combinations and if they produce 
expected result..

mw:
    warning+error (OK)
after reload:
    warning+error (OK)

mw+quiet(in config):
 only shows FIRST-startup errors (OK i guess, or should stdout not 
be closed as another re-start could be done and would desire error to be 
shown again..?.)

after reload:
     (OK) (or should it 'keep open' the stdout so it can 
output config errors again ?)

after reload without quiet config option:
     (FAIL?)

mw+debug:
    polling info+warning+error  (currently doesn't show 
debugging/headers info of connections).. (FAIL or intentional?)

after reload:
    polling info+warning+error  (currently doesn't show 
debugging/headers info of connections) (?)


mw+daemon:
 shows first startup errors+warnings(OK)
after reload:
     (OK)

mw+daemon+quiet:
 shows first startup errors (OK)
after reload:
     (OK)

mw+daemon+debug:
 shows first warnings+errors (OK)
after reload:
     (OK)

debug:
    shows pollerinfo+error+warning+debugging/headers (OK)

quiet:
    shows startup errors (OK)

daemon:
    shows startup warnings+errors (OK)

daemon+debug:
    shows startup warnings+errors (OK)

daemon+quiet: shows first startup errors (OK)
    shows startup errors (OK)


So removing 'quiet' from the config does something unwanted for MW.
And another possible issue i found is that when MW is reloaded with USR2 
and the config contains a error, after that is corrected and another 
SIG2 is send then the -x is not used when launching the new worker.. 
Seems to work OK afterwards though.. Is this intentional.?



Anyhow, wish you all good Christmas day(s) and a happy new year.
Regards,
PiBa-NL / Pieter

From 49272b2c0bafb413f0fb89717f17c784c2947a4f Mon Sep 17 00:00:00 2001
From: PiBa-NL <pba_...@yahoo.com>
Date: Mon, 25 Dec 2017 21:03:31 +0100
Subject: [PATCH] BUG/MEDIUM: mworker: avoid closing stdin/stdout/stderr file
 descriptors multiple times in the same master process

This makes sure that a frontend socket that gets created after initialization 
wont be closed when the mworker gets re-executed.
---
 src/haproxy.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index ffd7ea0..0666ad0 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2579,10 +2579,19 @@ int main(int argc, char **argv)
signal_register_fct(SIGTTIN, sig_listen, SIGTTIN);
 
/* MODE_QUIET can inhibit alerts and warnings below this line */
-
-   if ((global.mode & MODE_QUIET) && !(global.mode & MODE_VERBOSE)) {
-   /* detach from the tty */
-   fclose(stdin); fclose(stdout); fclose(stderr);
+   
+   if (getenv("HAPROXY_MWORKER_REEXEC") != NULL) {
+   // either stdin/out/err are already closed or should stay as 
they are.
+   if ((global.mode & MODE_DAEMON)) {
+   // daemon mode re-executing, stdin/stdout/stderr are 
already closed so keep quiet
+ 

Re: Status change from MAINT to UP

2017-12-13 Thread PiBa-NL

Hi Johan,

Op 13-12-2017 om 17:31 schreef Johan Hendriks:

When i use the show stat command I get different results?


Just a guess, are you using?: nbproc > 1
Are multiple (old?) haproxy processes running?

Perhaps including the used config could help diagnose.
And 'haproxy -vv' is always appreciated.

Regards,
PiBa-NL



Re: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-07 Thread PiBa-NL

Hi Christopher, Willy,

Op 7-12-2017 om 19:33 schreef Willy Tarreau:

On Thu, Dec 07, 2017 at 04:27:16PM +0100, Christopher Faulet wrote:

Honestly, I don't know which version is the best.

Just let me know guys :-)
imho Christopher's patch is smaller and probably easier to maintain and 
eventually remove without adding (unneeded) code to the 
set_server_check_status(). Though it is a bit less obvious to me that it 
will have the same effect, i works just as well.



Email alerts should
probably be rewritten to not use the checks. This was the only solution to
do connections by hand when Simon implemented it. That's not true anymore.

I agree and I think I was the one asking Simon to do it like this by then
eventhough he didn't like this solution. That was an acceptable tradeoff
in my opinion, with very limited impact on existing code. Now with applets
being much more flexible we could easily reimplement a more complete and
robust SMTP engine not relying on hijacking the tcp-check engine anymore.

Willy


A 'smtp engine' for sending email-alert's might be nice eventually but 
that is not easily done 'today'. (not by me anyhow) (Would it group 
messages together if multiple are created within a short time-span?)


As for the current issue / patch, i prefer the solution Christopher 
found/made.


Made a new version of it with a bit of extra comments inside the code, 
removed a unrelated white-space change, and added a matching patch 
description.
Or perhaps Christopher can create it under his own name? Either way is 
fine for me. :)


Regards,
PiBa-NL / Pieter

From 3129e1ae21e41a026f6d067b3658f6643835974c Mon Sep 17 00:00:00 2001
From: PiBa-NL <pba_...@yahoo.com>
Date: Wed, 6 Dec 2017 01:35:43 +0100
Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
 email-alert task

This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids 
sending lots of emails when 'option log-health-checks' is used. It is avoided 
to change the server state and possibly queue a new email while
processing the email alert by setting check->status to HCHK_STATUS_UNKNOWN 
which will exit the set_server_check_status(..) early.

This needs to be backported to 1.8.
---
 src/checks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index eaf84a2..3a6f020 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -3145,7 +3145,7 @@ static struct task *process_email_alert(struct task *t)
t->expire = now_ms;
check->server = alert->srv;
check->tcpcheck_rules = >tcpcheck_rules;
-   check->status = HCHK_STATUS_INI;
+   check->status = HCHK_STATUS_UNKNOWN; // the 
UNKNOWN status is used to exit set_server_check_status(.) early
check->state |= CHK_ST_ENABLED;
}
 
-- 
2.10.1.windows.1



[PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-05 Thread PiBa-NL

Hi List, Simon and Baptiste,

Sending to both of you guys as its both tcp-check and email related and 
you are the maintainers of those parts.

Patch subject+content basically says it all (i hope.).

It is intended to fixes yesterdays report: 
https://www.mail-archive.com/haproxy@formilux.org/msg28158.html


Please let me know if it is OK, or should be done differently.

Thanks in advance,
PiBa-NL / Pieter
From bf80b0398c08f94bebec30feaaddda422cb87ba1 Mon Sep 17 00:00:00 2001
From: PiBa-NL <pba_...@yahoo.com>
Date: Wed, 6 Dec 2017 01:35:43 +0100
Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
 email-alert task

This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids 
sending lots of emails when 'option log-health-checks' is used.
It is avoided to change the server state and possibly queue a new email while 
processing the email alert by checking if the check task is being processed for 
the process_email_alert struct.

This needs to be backported to 1.8.
---
 src/checks.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/checks.c b/src/checks.c
index eaf84a2..55bfde2 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -72,6 +72,7 @@ static int tcpcheck_main(struct check *);
 
 static struct pool_head *pool_head_email_alert   = NULL;
 static struct pool_head *pool_head_tcpcheck_rule = NULL;
+static struct task *process_email_alert(struct task *t);
 
 
 static const struct check_status check_statuses[HCHK_STATUS_SIZE] = {
@@ -198,6 +199,9 @@ const char *get_analyze_status(short analyze_status) {
  */
 static void set_server_check_status(struct check *check, short status, const 
char *desc)
 {
+   if (check->task->process == process_email_alert)
+   return; // email alerts should not change the status of the 
server
+   
struct server *s = check->server;
short prev_status = check->status;
int report = 0;
-- 
2.10.1.windows.1



haproxy 1.8.1 email-alert with log-health-checks, 100% cpu usage / mailbomb

2017-12-04 Thread PiBa-NL

Hi List,

Hereby a seemingly new case of 100% cpu usage / mailbomb on FreeBSD 11.1.

Below seems to be the (close to) minimal config, there is no mailserver, 
and no webserver listening on those ports.. The stats page is not 
requested. (But without it haproxy wont start as it doesn't see any 
bind's then..)


If a mailserver is listening, then 800+ mails are received..

Regards,
PiBa-NL / Pieter

defaults
    option log-health-checks
listen HAProxyLocalStats
    bind 127.0.0.1:42200 name localstats
    mode http
    stats enable
mailers globalmailers
    mailer ex01 127.0.0.1:3325
backend ServerTest_http_ipv4
    mode            http
    email-alert mailers            globalmailers
    email-alert level            info
    email-alert from            haproxy@pfsense.local
    email-alert to            m...@me.tld
    server            ServerTest 127.0.0.1:33443 check inter 1

root@:~ # uname -a
FreeBSD  11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 
UTC 2017 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 
amd64


It produces like 600+ lines of console output, ive numbered some of the 
lines and skipped most repeating ones..:

root@:~ # haproxy -f /root/hapconf.conf
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection error during SSL handshake (Broken pipe)", 
check duration: 0ms, status: 0/2 DOWN.
[WARNING] 337/193939 (44649) : Server ServerTest_http_ipv4/ServerTest is 
DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 
0 remaining in queue.
[ALERT] 337/193939 (44649) : backend 'ServerTest_http_ipv4' has no 
server available!
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 1ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
10[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.

... repeating same line over and over...
203[WARNING] 337/193939 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.
204[WARNING] 337/193942 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 3059ms, status: 0/1 DOWN.
205[WARNING] 337/193942 (44649) : Health check for server 
ServerTest_http_ipv4/ServerTest failed, reason: Layer4 connection 
problem, info: "Connection refused at step 1 of tcp-check (connect)", 
check duration: 0ms, status: 0/1 DOWN.

... repeating same line over and over...
403[WARNING]

Re: [PATCH] BUG/MINOR: when master-worker is in daemon mode, detach from tty

2017-11-29 Thread PiBa-NL

Hi William,

When you have time, please take a look below & attached :) .

Op 29-11-2017 om 1:28 schreef William Lallemand:

Hi Pieter,

diff --git a/src/haproxy.c b/src/haproxy.c
index c3c8281..a811577 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2648,6 +2648,13 @@ int main(int argc, char **argv)
}
  
  	if (global.mode & (MODE_DAEMON | MODE_MWORKER)) {

+   if ((!(global.mode & MODE_QUIET) || (global.mode & MODE_VERBOSE)) 
&&
+   ((global.mode & (MODE_DAEMON | MODE_MWORKER)) == 
(MODE_DAEMON | MODE_MWORKER))) {
+   /* detach from the tty, this is required to properly 
daemonize. */
+   fclose(stdin); fclose(stdout); fclose(stderr);
+   global.mode &= ~MODE_VERBOSE;
+   global.mode |= MODE_QUIET; /* ensure that we won't say 
anything from now */
+   }
struct proxy *px;
struct peers *curpeers;
int ret = 0;

I need to check that again later, in my opinion it should be done after the
pipe() so we don't inherit the 0 and 1 FDs in the pipe,
FDs for the master-worker pipe can still be 0 and 1 if running in quiet 
mode as the stdin/stdout/stderr are still closed before creating the 
pipe then. Should the pipe be created earlier?
I've moved the code to just before the mworker_wait() in new attached 
patch. This should allow (all?) possible warnings to be output before 
closing stdX, and still 'seems' to work properly..

we also need to rely on
setsid() to do a proper tty detach.
I've added a setsid(), but i must admit i have no clue what its doing 
exactly...

  This is already done in -D mode without -W, maybe
this part of the code should me moved elsewhere, but we have to be careful not
to break the daemon mode w/o mworker.

I've tried most combinations of parameters like these:
1: -W
2: -W -q
3: -D -W
4: -D -W -q
5: -D
6: -D -q
7: -q
8: (without parameters)
Both by starting directly from a ssh console, and by running from my php 
script that reads the stdout/stderr output. And reloading it with USR2 
with the -W mode..
It seemed that the expected output or lack thereof was being produced in 
all cases.
But it preferably also needs to be tested under systemd itself as that 
is the intended use-case, which i did not test at all :/ ..
Also i did not change the config while running to include/exclude 
'quiet' or 'daemon' option or something like that. Seems like a odd 
thing to do..


I'm not sure if the attached patch is OK for you like this, or needs to 
be implemented completely differently.
I have made and tried to test the changed patch with above cases but am 
sure there are many things / combinations with other features i have not 
included..
If i need to change it slightly somehow please let me know, if you need 
time to look into it further, i can certainly wait :) i do not 'need' 
the feature urgently or perhaps won't need it at all..


Anyhow when you have time to look into it, i look forward to your 
feedback :) . Thanks in advance.


Regards,
PiBa-NL / Pieter
From c103dbd7837d49721ccadfb1aee9520e821a020f Mon Sep 17 00:00:00 2001
From: PiBa-NL <pba_...@yahoo.com>
Date: Tue, 28 Nov 2017 23:26:08 +0100
Subject: [PATCH] BUG/MINOR: when master-worker is in daemon mode, detach from
 tty

This allows a calling script to show the first startup output and know when to 
stop reading from stdout so haproxy can daemonize.
---
 src/haproxy.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index 891a021..702501d 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2749,7 +2749,7 @@ int main(int argc, char **argv)
//lseek(pidfd, 0, SEEK_SET);  /* debug: emulate eglibc 
bug */
close(pidfd);
}
-
+   
/* We won't ever use this anymore */
free(global.pidfile); global.pidfile = NULL;
 
@@ -2757,6 +2757,16 @@ int main(int argc, char **argv)
if (global.mode & MODE_MWORKER) {
mworker_cleanlisteners();
deinit_pollers();
+
+   if ((!(global.mode & MODE_QUIET) || 
(global.mode & MODE_VERBOSE)) &&
+   ((global.mode & (MODE_DAEMON | 
MODE_MWORKER)) == (MODE_DAEMON | MODE_MWORKER))) {
+   /* detach from the tty, this is 
required to properly daemonize. */
+   fclose(stdin); fclose(stdout); 
fclose(stderr);
+   global.mode &= ~MODE_VERBOSE;
+   global.mode |= MODE_QUIET; /* ensure 
that we won't say anything from now */
+   setsid();
+   }
+  

[PATCH] BUG/MINOR: when master-worker is in daemon mode, detach from tty

2017-11-28 Thread PiBa-NL

Hi List,

Made a patch that makes the master-worker detach from tty when it is 
also combined with daemon mode to allow a script to start haproxy with 
daemon mode, closing stdout so the calling process knows when to stop 
reading from it and allow the master to properly daemonize.


This is intended to solve my previously reported 'issue' : 
https://www.mail-archive.com/haproxy@formilux.org/msg27963.html


Let me know if something about it needs fixing..

Thanks

PiBa-NL / Pieter



From 06224a3fcf7b39bf1bf0128a5bac3d0209bc2aab Mon Sep 17 00:00:00 2001
From: PiBa-NL <pba_...@yahoo.com>
Date: Tue, 28 Nov 2017 23:26:08 +0100
Subject: [PATCH] [PATCH] BUG/MINOR: when master-worker is in daemon mode,
 detach from tty

This allows a calling script to show the first startup output and know when to 
stop reading from stdout so haproxy can daemonize.
---
 src/haproxy.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/haproxy.c b/src/haproxy.c
index c3c8281..a811577 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2648,6 +2648,13 @@ int main(int argc, char **argv)
}
 
if (global.mode & (MODE_DAEMON | MODE_MWORKER)) {
+   if ((!(global.mode & MODE_QUIET) || (global.mode & 
MODE_VERBOSE)) &&
+   ((global.mode & (MODE_DAEMON | MODE_MWORKER)) == 
(MODE_DAEMON | MODE_MWORKER))) {
+   /* detach from the tty, this is required to properly 
daemonize. */
+   fclose(stdin); fclose(stdout); fclose(stderr);
+   global.mode &= ~MODE_VERBOSE;
+   global.mode |= MODE_QUIET; /* ensure that we won't say 
anything from now */
+   }
struct proxy *px;
struct peers *curpeers;
int ret = 0;
-- 
2.10.1.windows.1



[PATCH] BUG/MINOR: Check if master-worker pipe getenv succeeded, also allow pipe fd 0 as valid.

2017-11-28 Thread PiBa-NL

Hi List, Willy / Willliam,

A patch i came up with that might make it a little 'safer' with regard 
to getenv and its return value or possible lack thereof.. I'm not sure 
it it will ever happen. But if it does it wont fail on a null pointer or 
empty string conversion to a long value.. Though a arithmetic conversion 
error could still happen if the value is present but not a number..but 
well that would be a really odd case.


There are a few things i'm not sure about though.

- What would/could possibly break if mworker_pipe values are left as -1 
and the process continues and tries to use it?

- wont the rd wr char* values leak memory?

Anyhow the biggest part that should be noticed of the bug is the 
sometimes wrongful alert when the fd is actually '0'...


If anything needs to be changed let me know.

Regards,

PiBa-NL / Pieter


From 486d7c759af03f9193ae3e38005d8325ab069b37 Mon Sep 17 00:00:00 2001
From: PiBa-NL <pba_...@yahoo.com>
Date: Tue, 28 Nov 2017 23:22:14 +0100
Subject: [PATCH] [PATCH] BUG/MINOR: Check if master-worker pipe getenv
 succeeded, also allow pipe fd 0 as valid.

On FreeBSD in quiet mode the stdin/stdout/stderr are closed which lets the 
mworker_pipe to use fd 0 and fd 1.
---
 src/haproxy.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index 891a021..c3c8281 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2688,9 +2688,15 @@ int main(int argc, char **argv)
free(msg);
}
} else {
-   mworker_pipe[0] = 
atol(getenv("HAPROXY_MWORKER_PIPE_RD"));
-   mworker_pipe[1] = 
atol(getenv("HAPROXY_MWORKER_PIPE_WR"));
-   if (mworker_pipe[0] <= 0 || mworker_pipe[1] <= 
0) {
+   mworker_pipe[0] = -1;
+   mworker_pipe[1] = -1;
+   char* rd = getenv("HAPROXY_MWORKER_PIPE_RD");
+   char* wr = getenv("HAPROXY_MWORKER_PIPE_WR");
+   if (rd && wr && strlen(rd) > 0 && strlen(wr) > 
0) {
+   mworker_pipe[0] = atol(rd);
+   mworker_pipe[1] = atol(wr);
+   }
+   if (mworker_pipe[0] < 0 || mworker_pipe[1] < 0) 
{
ha_warning("[%s.main()] Cannot get 
master pipe FDs.\n", argv[0]);
}
}
-- 
2.10.1.windows.1



Re: haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-28 Thread PiBa-NL

Hi Christopher / Willy,

On Tue, Nov 28, 2017 at 10:28:20AM +0100, Christopher Faulet wrote:


Here is a patch that should fix the deadlock. Could you confirm it fixes
your bug ?

Fix confirmed.

Thanks,
PiBa-NL / Pieter



haproxy-1.8.0, sending a email-alert causes 100% cpu usage, FreeBSD 11.1

2017-11-27 Thread PiBa-NL

Hi List,

I thought i 'reasonably' tested some of 1.8.0's options.
Today i put it into 'production' on my secondary cluster node and notice 
it takes 100% cpu... I guess i should have tried such a thing last week.
My regular config with 10 frontends and total 13 servers seems to 
startup fine when 'email-alert level' is set to 'emerg' , doesnt need to 
send a mail then..


Anyhow below some gdb and console output.
Config that reproduces it is pretty simple no new features used or anything.
Though the server is 'down' so it is trying to send a mail for that.. 
that never seems to happen though.. no mail is received.


I tried using nokqueu and nopoll, but that did not result in any 
improvement..


Anything else i can provide?

Regards,
PiBa-NL / Pieter

haproxy -f /root/hap.conf -V
[WARNING] 330/204605 (14771) : config : missing timeouts for frontend 
'TestMailFront'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 330/204605 (14771) : config : missing timeouts for backend 
'TestMailBack'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Note: setting global.maxconn to 2000.
Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result FAILED
Total: 3 (2 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe
Using kqueue() as the polling mechanism.
[WARNING] 330/204608 (14771) : Server TestMailBack/TestServer is DOWN, 
reason: Layer4 timeout, check duration: 2009ms. 0 active and 0 backup 
servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

[ALERT] 330/204608 (14771) : backend 'TestMailBack' has no server available!

Complete configuration that reproduces the issue:

mailers globalmailers
    mailer ex01 192.168.0.40:25
frontend TestMailFront
    bind :88
    default_backend  TestMailBack
backend TestMailBack
    server TestServer 192.168.0.250:80 check
    email-alert mailers            globalmailers
    email-alert level            info
    email-alert from            haproxy@me.local
    email-alert to            m...@me.tld
    email-alert myhostname        pfs


root@:~ # haproxy -vv
HA-Proxy version 1.8.0 2017/11/26
Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -pipe -g -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label 
-DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe

root@:~ #

root@:~ # /usr/local/bin/gdb --pid 14771
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources onli

Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner

2017-11-25 Thread PiBa-NL

Hi Willy,

Op 25-11-2017 om 8:33 schreef Willy Tarreau:

Hi Pieter,

On Tue, Nov 21, 2017 at 04:34:16PM +0100, PiBa-NL wrote:

Hi William,

I was intending to use the new feature to pass open sockets to the next
haproxy process.
And thought that master-worker is a 'requirement' to make that work as it
would manage the transferal of sockets.
Now i'm thinking thats not actually how its working at all..
I could 'manually' pass the  -x /haproxy.socket to the next process and make
it take over the sockets that way i guess.?

Yes it's the intent indeed. Master-worker and -x were developed in parallel
and then master-worker was taught to be compatible with this, but the primary
purpose of -x is to pass FDs without needing MW.
Great, i suppose ill need to make a few (small) changes implementing 
this then in the package i maintain for pfSense, probably easier than 
changing it to use master-worker anyhow :).



(How does this combine with nbproc>1 and multiple stats sockets bound to
separate processes?)

There's a special case for this. Normally, as you know, listening FDs not
used in a process are closed after the fork(). Now by simply setting
"expose-fd listeners" on your stats socket, the process running the stats
socket will keep *all* listening FDs open in order to pass them upon
invocation from the CLI. Thus, even with nbproc>1, sockets split across
different processes and a single stats socket, -x will retrieve all
listeners at once.

Oke this will work well then :).
Was thinking if i'm going to do it myself (pass the -x argument), i need 
to make sure i do it properly.

Though i can imagine a future where the master would maybe provide some
aggregated stats and management socket to perform server status changes.
Perhaps i should step away from using master-worker for the moment.

I think you don't need it for now and you're right that we'd all like it
to continue to evolve. Simon had done an amazing work on this subject in
the past, making a socket server and stuff like this but by then the
internal architecture was not ready and we faced many issues so we had to
drop it. But despite this there was already a huge motivation in trying
to get this to work. This was during 1.5-dev5! Since then, microservices
have emerged with the need for more common updates, the need to aggregate
information has increased, etc. So yes, I think that the reasons that
motivated us to try this 7 years ago are still present and have been
emphasized over time. Maybe in a few years the master-worker mode will
be the only one supported if it provides extra facilities such as being
a CLI gateway for all processes or collecting stats. Let's just not rush
and use it for what it is for now : a replacement for the systemd-wrapper.
Ok clear, and thanks for the history involved, i'm not using the 
systemd-wrapper, so no need for me to use its replacement. I just 
thought it looked fancy to use and maybe 'future proof' though that is 
to early to really tell.. no more 'restarting' of processes but just 
sending a 'reload' request did seem like a better design. (though in the 
background the same restarting of processes still happens..) This with 
the added (wrongful) though it was required for socket transferal i was 
thinking lets give it a try :).

However the -W -D combination doesn't seem to (fully) work as i expected,
responses below..

As mentionned in the other thread, there's an issue on this and kqueue
that I have no idea about. I'm suspecting an at_exit() doing nasty stuff
somewhere and some head-scratching will be needed (I hate the principle
of at_exit() as it cheats on the stuff you believe when reading it).
Ok, thanks for looking into it. No need to rush as i can work with rc4 
as it is.. At least on my test machine..

I would prefer to do both 'catch' startup errors and daemonize haproxy.
In my previous mail i'm starting it with -D, and the -W is equivalent of the
global master-worker option in the config, so it 'should' daemonize right?
But it did not(properly?), ive just tried with both startup parameters -D -W
the result is the same.
The master with pid 3061 is running under the system /sbin/init pid 1,
however the pid 2926 also keeps running i would want/expect 2926 to exit
when startup is complete.

I just also noted that the 2926 actually becomes a 'zombie'.?. that cant be
good right?

It's *possible* that this process either still had a connection and couldn't
quit, or that a bug made it believe it still had a connection. Given that you
had a very strange behaviour with -D -W, let's consider there's an unknown
issue there for now and that it could explain a lot of strange behaviours.
No 'connections' to the process iirc its a isolated test environment and 
it seems related to the stdout/errout output.. I did have one of these 
handles open to these outputs as i described in the next mail with the 
little php example code..



It seems to work properly the way you describe.. (when properly de

Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon mode does not bind.?.

2017-11-25 Thread PiBa-NL

Hi Willy,

Op 25-11-2017 om 8:08 schreef Willy Tarreau:

Hi Pieter,

We found that it's the first fork_poller() placed after the fork which
fixed the issue but that makes absolutely no sense since the next thing
done is exit(), which also results in closing the kqueue fd! So unless
there's something in at_exit() having an effect there, it really doesn't
make much sense and could imply something more subtle. Thus my question
is : do you have a real use case of master-worker+daemon on freebsd or
can we release like this and try to fix it after the release ? It's
about the last rock in the shoe we have.

Thanks,
Willy


Personally i do not need master-worker at present time..
But running it with 'quiet' configuration option or -q startup parameter 
is something that imho does need to be checked though. It seems to 
always die when doing so after sending a USR2, also without using daemon 
mode.!. (i didn't quite specify that fully in the other mail/thread..)


-- Some background info --
My usecase mainly revolves around running haproxy on pfSense (a FreeBSD 
firewall distribution), im using and maintaining the haproxy 'package' 
for it.
All services there are 'managed' by php and shell scripts. When users 
modify the configuration in the webgui and press 'apply' i need to 
restart haproxy and show any warnings/alerts that might have been returned.

This has worked and still works fine without requiring master-worker.
So no problem or missing advantages for myself or users of that package 
if haproxy 1.8 gets released as it currently is.
I'm not sure if other folks are using some service management tool like 
systemd on FreeBSD... But i guess time will tell, i'm sure 1.8 wont be 
the last release ever so fixes when required will come :).


Regards,
PiBa-NL / Pieter




Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner

2017-11-22 Thread PiBa-NL

Hi William,

Found the 'crash?' i was talking about earlier again.
Start haproxy like this:
    haproxy -f /root/hap.conf -W -D -dk -q
Then issues a USR2 to the master. (the first parent/zombie is already 
gone so thats good imho..)
It will temporarily start new workers and then immediately everything 
stops running..


Anyhow looking forward to your replies.

Regards,
PiBa-NL

Op 22-11-2017 om 17:48 schreef PiBa-NL:

Hi William,

I'm not 100% sure but i think the stdout and errout files should be 
closed before process exit?

It seems to me they are not.

At least with the following php script it fails to 'read' where the 
output from haproxy ends and it keeps waiting.

Without the -W it succeeds.

Could you check?

Regards,
PiBa-NL

#!/usr/local/bin/php-cgi -f
   0 => array("pipe", "r"),  // stdin is a pipe that the child will 
read from
   1 => array("pipe", "w"),  // stdout is a pipe that the child will 
write to

   2 => array("pipe", "w") // stderr is a file to write to
);

$cwd = '/root';
$env = array();

$process = proc_open('haproxy -f hap.conf -W -D -dk', $descriptorspec, 
$pipes, $cwd, $env);

echo "\n  START\n";

echo "\n  procstatus\n";
print_r(proc_get_status($process));

if (is_resource($process)) {
    echo "\n  ERROUT\n";
    while (false !== ($char = fgetc($pipes[2]))) {
        echo "$char";
    }

    echo "\n  STDOUT\n";
    while (false !== ($char = fgetc($pipes[1]))) {
        echo "$char";
    }

    echo "\n DONE reading..";
    fclose($pipes[0]);
    fclose($pipes[1]);
    fclose($pipes[2]);

    $return_value = proc_close($process);
    echo "command returned $return_value\n";
} else {
    echo 'FAIL';
};


Op 21-11-2017 om 16:34 schreef PiBa-NL:

Hi William,

I was intending to use the new feature to pass open sockets to the 
next haproxy process.
And thought that master-worker is a 'requirement' to make that work 
as it would manage the transferal of sockets.

Now i'm thinking thats not actually how its working at all..
I could 'manually' pass the  -x /haproxy.socket to the next process 
and make it take over the sockets that way i guess.? (How does this 
combine with nbproc>1 and multiple stats sockets bound to separate 
processes?)


Though i can imagine a future where the master would maybe provide 
some aggregated stats and management socket to perform server status 
changes.

Perhaps i should step away from using master-worker for the moment.

However the -W -D combination doesn't seem to (fully) work as i 
expected, responses below..


Op 21-11-2017 om 2:59 schreef William Lallemand:

the master-worker was designed in a way to
replace the systemd-wrapper, and the systemd way to run a daemon is 
to keep it
on the foreground and pipe it to systemd so it can catch the errors 
on the

standard ouput.

However, it was also designed for normal people who wants to daemonize,
so you can combine -W with -D which will daemonize the master.


I'm not sure of getting the issue there, the errors are still 
displayed upon

startup like in any other haproxy mode, there is really no change here.
I assume your only problem with your script is the daemonize that 
you can

achieve by combining -W and -D.

I would prefer to do both 'catch' startup errors and daemonize haproxy.
In my previous mail i'm starting it with -D, and the -W is equivalent 
of the global master-worker option in the config, so it 'should' 
daemonize right?
But it did not(properly?), ive just tried with both startup 
parameters -D -W the result is the same.
The master with pid 3061 is running under the system /sbin/init pid 
1, however the pid 2926 also keeps running i would want/expect 2926 
to exit when startup is complete.


I just also noted that the 2926 actually becomes a 'zombie'.?. that 
cant be good right?





A kill -1 itself wont tell if a new configured bind cannot find the
interface address to bind to? and a -c before hand wont find such a 
problem.

Upon a reload (SIGUSR2 on the master) the master will try to parse the
configuration again and start the listeners. If it fails, the master 
will
reexec itself in a wait() mode, and won't kill the previous workers, 
the
parsing/bind error should be displayed on the standard output of the 
master.
I think i saw it exit but cannot reproduce it anymore with the 
scenario of a wrong ip in the bind.. I might have issued a wrong 
signal there when i tried (a USR1 instead of a USR2 or something. ).
It seems to work properly the way you describe.. (when properly 
demonized..)

Sorry for the noise on this part..

The end result that nothing is running and the error causing that
however should be 'caught' somehow for logging.?. should haproxy 
itself

log it to syslogs? but how will the startup script know to notify the
user of a failure?
Well, the master don't do syslog, because there might be n

Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner

2017-11-22 Thread PiBa-NL

Hi William,

I'm not 100% sure but i think the stdout and errout files should be 
closed before process exit?

It seems to me they are not.

At least with the following php script it fails to 'read' where the 
output from haproxy ends and it keeps waiting.

Without the -W it succeeds.

Could you check?

Regards,
PiBa-NL

#!/usr/local/bin/php-cgi -f
   0 => array("pipe", "r"),  // stdin is a pipe that the child will 
read from
   1 => array("pipe", "w"),  // stdout is a pipe that the child will 
write to

   2 => array("pipe", "w") // stderr is a file to write to
);

$cwd = '/root';
$env = array();

$process = proc_open('haproxy -f hap.conf -W -D -dk', $descriptorspec, 
$pipes, $cwd, $env);

echo "\n  START\n";

echo "\n  procstatus\n";
print_r(proc_get_status($process));

if (is_resource($process)) {
    echo "\n  ERROUT\n";
    while (false !== ($char = fgetc($pipes[2]))) {
        echo "$char";
    }

    echo "\n  STDOUT\n";
    while (false !== ($char = fgetc($pipes[1]))) {
        echo "$char";
    }

    echo "\n DONE reading..";
    fclose($pipes[0]);
    fclose($pipes[1]);
    fclose($pipes[2]);

    $return_value = proc_close($process);
    echo "command returned $return_value\n";
} else {
    echo 'FAIL';
};


Op 21-11-2017 om 16:34 schreef PiBa-NL:

Hi William,

I was intending to use the new feature to pass open sockets to the 
next haproxy process.
And thought that master-worker is a 'requirement' to make that work as 
it would manage the transferal of sockets.

Now i'm thinking thats not actually how its working at all..
I could 'manually' pass the  -x /haproxy.socket to the next process 
and make it take over the sockets that way i guess.? (How does this 
combine with nbproc>1 and multiple stats sockets bound to separate 
processes?)


Though i can imagine a future where the master would maybe provide 
some aggregated stats and management socket to perform server status 
changes.

Perhaps i should step away from using master-worker for the moment.

However the -W -D combination doesn't seem to (fully) work as i 
expected, responses below..


Op 21-11-2017 om 2:59 schreef William Lallemand:

the master-worker was designed in a way to
replace the systemd-wrapper, and the systemd way to run a daemon is 
to keep it
on the foreground and pipe it to systemd so it can catch the errors 
on the

standard ouput.

However, it was also designed for normal people who wants to daemonize,
so you can combine -W with -D which will daemonize the master.


I'm not sure of getting the issue there, the errors are still 
displayed upon

startup like in any other haproxy mode, there is really no change here.
I assume your only problem with your script is the daemonize that you 
can

achieve by combining -W and -D.

I would prefer to do both 'catch' startup errors and daemonize haproxy.
In my previous mail i'm starting it with -D, and the -W is equivalent 
of the global master-worker option in the config, so it 'should' 
daemonize right?
But it did not(properly?), ive just tried with both startup parameters 
-D -W the result is the same.
The master with pid 3061 is running under the system /sbin/init pid 1, 
however the pid 2926 also keeps running i would want/expect 2926 to 
exit when startup is complete.


I just also noted that the 2926 actually becomes a 'zombie'.?. that 
cant be good right?





A kill -1 itself wont tell if a new configured bind cannot find the
interface address to bind to? and a -c before hand wont find such a 
problem.

Upon a reload (SIGUSR2 on the master) the master will try to parse the
configuration again and start the listeners. If it fails, the master 
will

reexec itself in a wait() mode, and won't kill the previous workers, the
parsing/bind error should be displayed on the standard output of the 
master.
I think i saw it exit but cannot reproduce it anymore with the 
scenario of a wrong ip in the bind.. I might have issued a wrong 
signal there when i tried (a USR1 instead of a USR2 or something. ).
It seems to work properly the way you describe.. (when properly 
demonized..)

Sorry for the noise on this part..

The end result that nothing is running and the error causing that
however should be 'caught' somehow for logging.?. should haproxy itself
log it to syslogs? but how will the startup script know to notify the
user of a failure?
Well, the master don't do syslog, because there might be no syslog in 
your
configuration. I think you should try the systemd way and log the 
standard

output.
I don't want to use systemd, but i do want to log standard output, at 
least during initial startup..

Would it be possible when starting haproxy with -sf  it would tell
if the (original?) master was successful in reloading the config /
starting new workers or how should this be done?
That may be badly documented but y

Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner

2017-11-21 Thread PiBa-NL

Hi William,

I was intending to use the new feature to pass open sockets to the next 
haproxy process.
And thought that master-worker is a 'requirement' to make that work as 
it would manage the transferal of sockets.

Now i'm thinking thats not actually how its working at all..
I could 'manually' pass the  -x /haproxy.socket to the next process and 
make it take over the sockets that way i guess.? (How does this combine 
with nbproc>1 and multiple stats sockets bound to separate processes?)


Though i can imagine a future where the master would maybe provide some 
aggregated stats and management socket to perform server status changes.

Perhaps i should step away from using master-worker for the moment.

However the -W -D combination doesn't seem to (fully) work as i 
expected, responses below..


Op 21-11-2017 om 2:59 schreef William Lallemand:

the master-worker was designed in a way to
replace the systemd-wrapper, and the systemd way to run a daemon is to keep it
on the foreground and pipe it to systemd so it can catch the errors on the
standard ouput.

However, it was also designed for normal people who wants to daemonize,
so you can combine -W with -D which will daemonize the master.



I'm not sure of getting the issue there, the errors are still displayed upon
startup like in any other haproxy mode, there is really no change here.
I assume your only problem with your script is the daemonize that you can
achieve by combining -W and -D.

I would prefer to do both 'catch' startup errors and daemonize haproxy.
In my previous mail i'm starting it with -D, and the -W is equivalent of 
the global master-worker option in the config, so it 'should' daemonize 
right?
But it did not(properly?), ive just tried with both startup parameters 
-D -W the result is the same.
The master with pid 3061 is running under the system /sbin/init pid 1, 
however the pid 2926 also keeps running i would want/expect 2926 to exit 
when startup is complete.


I just also noted that the 2926 actually becomes a 'zombie'.?. that cant 
be good right?





A kill -1 itself wont tell if a new configured bind cannot find the
interface address to bind to? and a -c before hand wont find such a problem.

Upon a reload (SIGUSR2 on the master) the master will try to parse the
configuration again and start the listeners. If it fails, the master will
reexec itself in a wait() mode, and won't kill the previous workers, the
parsing/bind error should be displayed on the standard output of the master.
I think i saw it exit but cannot reproduce it anymore with the scenario 
of a wrong ip in the bind.. I might have issued a wrong signal there 
when i tried (a USR1 instead of a USR2 or something. ).

It seems to work properly the way you describe.. (when properly demonized..)
Sorry for the noise on this part..

The end result that nothing is running and the error causing that
however should be 'caught' somehow for logging.?. should haproxy itself
log it to syslogs? but how will the startup script know to notify the
user of a failure?

Well, the master don't do syslog, because there might be no syslog in your
configuration. I think you should try the systemd way and log the standard
output.
I don't want to use systemd, but i do want to log standard output, at 
least during initial startup..

Would it be possible when starting haproxy with -sf  it would tell
if the (original?) master was successful in reloading the config /
starting new workers or how should this be done?

That may be badly documented but you are not supposed to use -sf with the 
master worker,
you just have to send the -USR2 signal to the master and it will parse again the
configuration, launch new workers and kill smoothly the previous ones.

Unfortunately signals are asynchronous, and we don't have a way yet to return
a bad exit code upon reload. But we might implement a synchronous
configuration notification in the future, using the admin socket for example.
Being able to signaling the master to reload over a admin socket and 
getting 'feedback' about its results would likely also solve my 'reload 
feedback' problem.

Lets consider that a feature request :).
Though maybe i shouldn't be using master-worker at all for the moment..

Currently a whole new set of master-worker processes seems to be take over..

Well, I supposed that's because you launched a new master-worker with -sf, it's
not supposed to be used that way but it should work too if you don't mind
having a new PID.
I kinda expected this to indeed be 'as intended' -sf will fully replace 
the old processes.


Thanks for your reply.

Regards,
PiBa-NL / Pieter



Re: 4xx statistics made useless through health checks?

2017-11-21 Thread PiBa-NL

Hi Daniel,

Op 21-11-2017 om 14:20 schreef Daniel Schneller:

On 21. Nov. 2017, at 14:08, Lukas Tribus <lu...@ltri.eu> wrote:
[...]
Instead of hiding specific errors counters, why not send an actual
HTTP request that triggers a 200 OK response? So health checking is
not exempt from the statistics and only generates error statistics
when actual errors occur?

Good point. I wanted to avoid, however, having these “high level” health checks 
from the many many sidecars being routed through to the actual backends.
Instead, I considered it enough to “only” check if the central haproxy is 
available. In case it is, the sidecars rely on it doing the actual health 
checks of the backends and responding with 503 or similar, when all backends 
for a particular request happen to be down.
Maybe monitor-uri perhaps together with 'monitor fail' could help ?: 
http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#4.2-monitor-uri
It says it wont log or forward the request.. not sure but maybe stats 
will also skip it.


However, your idea and a little more Googling led me to this Github repo 
https://github.com/jvehent/haproxy-aws#healthchecks-between-elb-and-haproxy 
where they configure a dedicated “health check frontend” (albeit in their case 
to work around an AWS/ELB limitation re/ PROXY protocol). I think I will adapt 
this and configure the sidecars to health check on a dedicated port like this.

I’ll let you know how it goes.

Thanks a lot for your thoughts, so far :)

Daniel


Regards,
PiBa-NL / Pieter




haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon parent staying alive/process-owner

2017-11-20 Thread PiBa-NL

Hi List,

I've got a startup script that essentially looks like the one below #1# 
(simplified..)
When configured with master-worker, the first parent process 2926 as 
seen in #2# keeps running.
Doing the same without master-worker, the daemon properly detaches and 
the parent exits returning possible warnings/errors..


When the second php exec line in #1# with "> /dev/null" is used instead 
it does succeed.


While its running the stats page does get served by the workers..

To avoid a possible issue with polers(see my previous mail thread) ive 
tried to add the -dk but still the first started parent process stays 
alive..
And if terminated with a ctrl+c it stops the other master-worker 
processes with it.. as can be seen in #3# (was from a different attempt 
so different processid's.).


'truss' output (again with different pids..): 
https://0bin.net/paste/f2p8uRU1t2ebZjkL#iJOBdPnR8mCmRrtGGkEaqsmQXfbHmQ56vQHdseh1x8U


If desired i can gater the htop/truss/console output information from a 
single run..


Any other info i can provide? Or should i change my script to not expect 
any console output from haproxy? In my original script the 'exec' is 
called with 2 extra parameters that return the console output and exit 
status..


p.s.
how should configuration/startup errors be 'handled' when using 
master-worker?
A kill -1 itself wont tell if a new configured bind cannot find the 
interface address to bind to? and a -c before hand wont find such a problem.
The end result that nothing is running and the error causing that 
however should be 'caught' somehow for logging.?. should haproxy itself 
log it to syslogs? but how will the startup script know to notify the 
user of a failure?
Would it be possible when starting haproxy with -sf  it would tell 
if the (original?) master was successful in reloading the config / 
starting new workers or how should this be done?

Currently a whole new set of master-worker processes seems to be take over..

Or am i taking the wrong approach here?

Regards,
PiBa-NL / Pieter

#1# Startup script (simplified..) haproxy.sh:

#!/bin/sh
echo "Starting haproxy."
/usr/local/bin/php -q <//    exec("/usr/local/sbin/haproxy -f /var/etc/haproxy/haproxy.cfg -D 
-dk > /dev/null");


?>
ENDOFF
echo "Started haproxy..."


#2# process list:

  PID  PPID  PGRP  SESN TPGID NLWP USER  PRI  NI  VIRT RES S CPU% 
MEM%   TIME+  Command
 9203 1  9203  9203 0    1 root   20   0 53492  4492 S  
0.0  0.4  0:00.02    `- /usr/sbin/sshd
99097  9203 99097 99097 0    1 root   20   0 78840  7608 S  0.0  
0.8  0:01.04    |  `- sshd: root@pts/0
99900 99097 99900 99900  2651    1 root   24   0 13084  2808 S  0.0  
0.3  0:00.01    |  |  `- -sh
  161 99900   161 99900  2651    1 root   52   0 13084  2688 S  
0.0  0.3  0:00.00    |  | `- /bin/sh /etc/rc.initial
 3486   161  3486 99900  2651    1 root   20   0 13392  3696 S  
0.0  0.4  0:00.19    |  |    `- /bin/tcsh
 2651  3486  2651 99900  2651    1 root   21   0 13084  2660 S  
0.0  0.3  0:00.00    |  |   `- /bin/sh 
/usr/local/etc/rc.d/haproxy.sh start
 2801  2651  2651 99900  2651    1 root   27   0 232M 19500 S  0.0  
2.0  0:00.07    |  |  `- /usr/local/bin/php -q
 2926  2801  2651 99900  2651    1 root   29   0 0 0 Z  0.0  
0.0  0:00.01    |  | `- /usr/local/sbin/haproxy -f 
/var/etc/haproxy/haproxy.cfg -D -dk
 3061 1  2651 99900  2651    1 root   31   0 28288  7420 S  
0.0  0.7  0:00.00    `- /usr/local/sbin/haproxy -f 
/var/etc/haproxy/haproxy.cfg -D -dk
 3524  3061  3524  3524 0    1 root   20   0 28288  7436 S  
0.0  0.7  0:00.04    |  `- /usr/local/sbin/haproxy -f 
/var/etc/haproxy/haproxy.cfg -D -dk
 3432  3061  3432  3432 0    1 root   20   0 28288  7436 S  
0.0  0.7  0:00.04    |  `- /usr/local/sbin/haproxy -f 
/var/etc/haproxy/haproxy.cfg -D -dk
 3276  3061  3276  3276 0    1 root   20   0 28288  7436 S  
0.0  0.7  0:00.04    |  `- /usr/local/sbin/haproxy -f 
/var/etc/haproxy/haproxy.cfg -D -dk
 3103  3061  3103  3103 0    1 root   20   0 28288  7436 S  
0.0  0.7  0:00.04    |  `- /usr/local/sbin/haproxy -f 
/var/etc/haproxy/haproxy.cfg -D -dk



#3# starting script from ssh and terminating it with Ctrl+C:

[2.4.3-DEVELOPMENT][root@pfSe.localdomain]/root: 
/usr/local/etc/rc.d/haproxy.sh

Starting haproxy.
[WARNING] 324/010345 (94381) : config : missing timeouts for proxy 
'HAProxyLocalStats'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 324/010345 (94381) : Proxy 'HAProxyLocalStats': in 
multi-process mode, stats will be limited to process assigned to the 
current request.
[WARNING] 324/010345 (94381) : Proxy 'HAProxyLocalStats': stats admin 
will not work corr

Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon mode does not bind.?.

2017-11-20 Thread PiBa-NL

Hi Willy,

Op 20-11-2017 om 22:08 schreef Willy Tarreau:

OK thank you. I suspect something wrong happens, such as the master
killing the same kevent_fd as the other ones are using or something
like this.

Could you please try the attached patch just in case it fixes anything ?
I have not testedit and it may even break epoll, but one thing at a
time :-)

Thanks,
Willy


You patch fixes the issue.

If you've got a definitive patch lemme know.

Thanks,

PiBa-NL




Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon mode does not bind.?.

2017-11-20 Thread PiBa-NL

Hi Willy,
Op 20-11-2017 om 21:46 schreef Willy Tarreau:

Hi Pieter,

On Mon, Nov 20, 2017 at 01:47:48AM +0100, PiBa-NL wrote:

Hmmm thinking about it there might be something. Could you start with
"-dk" to disable kqueue and fall back to poll ? kqueue registers a post-
fork function to close and reopen the kqueue fd. I wouldn't be surprized
if we're having a problem with it not being placed exactly where needed
when running in master-worker mode. Or maybe we need to call it twice
when forking into background and one call is missing somewhere.

Thanks!
Willy


With -dk it starts in background and serves the stats page as expected.

So seems indeed related to the poller used in combination with 
master-worker.


Regards,

PiBa-NL




Re: haproxy-1.8-rc4 - FreeBSD 11.1 - master-worker daemon mode does not bind.?.

2017-11-20 Thread PiBa-NL
A little bump.. Wondering if the truss attachments maybe got my mail 
blocked.. ( the mail doesn't show on 
https://www.mail-archive.com/haproxy@formilux.org/maillist.html )

They where 107KB total in 2 attachments.. Now in 0bin links:
https://0bin.net/paste/GIcxOfap-GYPrO7H#OnvGNxx2k41SLEK6VxJk9n-mD7vv/vQe/Pj33VRqdju
https://0bin.net/paste/sJ955XNt2hE1a9mF#xsMP2tzydlK3BVpxo2nNRl878SRbxZNAUpRw5-YhwdM

Op 20-11-2017 om 1:47 schreef PiBa-NL:

Hi List,

After compiling haproxy 1.8-rc4 (without modifications) on FreeBSD11.1 
i'm trying to run it with master-worker option.


I can run it with the following config from a ssh console:

global
    #daemon
    master-worker
   nbproc 4

listen HAProxyLocalStats
    bind :2200 name localstats
    mode http
    stats enable
    stats refresh 2
    stats admin if TRUE
    stats uri /
    stats show-desc Test2

It then starts 5 haproxy processes and the stats page works, being 
served from one of the workers.


However if i start it with the 'daemon' option enabled or the -D 
startup parameter, it starts in background, also starts 4 workers, but 
then doesn't respond to browser requests..
Sending a 'kill -1' to the master does start new workers see output 
below.


Truss output attached from commands below with a few requests to the 
stats page..

 truss -dfHo /root/haproxy-truss.txt -f haproxy -f /root/hap.conf -D
 truss -dfHo /root/haproxy-truss.txt -f haproxy -f /root/hap.conf

truss shows that 'accept4' isn't called when ran in daemon mode..

Am i doing something wrong? Or how can i check this further?

Regards,
PiBa-NL / Pieter


root@:/ # haproxy -vv
HA-Proxy version 1.8-rc4-cfe1466 2017/11/19
Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -O2 -pipe -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label 
-DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support.
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with network namespace support.
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe


root@:/ # haproxy -f /root/hap.conf -D
[WARNING] 323/005936 (1604) : config : missing timeouts for proxy 
'HAProxyLocalStats'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 323/005936 (1604) : Proxy 'HAProxyLocalStats': in 
multi-process mode, stats will be limited to process assigned to the 
current request.
[WARNING] 323/005936 (1604) : Proxy 'HAProxyLocalStats': stats admin 
will not work correctly in multi-process mode.

root@:/ # kill -1 1605
root@:/ # [WARNING] 323/005936 (1605) : Reexecuting Master process
[WARNING] 323/005945 (1605) : config : missing timeouts for proxy 
'HAProxyLocalStats'.
   | While not properly invalid, you will certainly encounter various 
problems
   | with such a configuration. To fix this, please ensure that all 
following

   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[WARNING] 323/005945 (1605) : Proxy 'HAProxyLocalStats': in 
multi-process mode, stats will be limited to process assigned to the 
current request.
[WARNING] 323/005945 (1605) : Proxy 'HAProxyLocalStats': stats admin 
will not work correctly in multi-process mode.

[WARNING] 323/005945 (1605) : Former worker 1607 left with exit code 0
[WARNING] 323/005945 (1605) : Former worker 1606 left with exit code 0
[WARNING] 323/005945 (1605) : Former worker 1608 left with exit code 0
[WARNING] 323/005945 (1605) : Former worker 1609 left with exit code 0







haproxy-1.8-rc4 - FreeBSD 11.1 - build error: undefined reference, plock.h __unsupported_argument_size_for_pl_try_s__

2017-11-19 Thread PiBa-NL

Hi haproxy-list,

I'm trying to build 1.8rc4 on FreeBSD 11.1,  but it throws a few errors 
for me..


src/listener.o: In function `listener_accept':
/usr/ports/net/haproxy-devel/work/haproxy-1.8-rc4/src/listener.c:455: 
undefined reference to `__unsupported_argument_size_for_pl_try_s__'

src/signal.o: In function `__signal_process_queue':
/usr/ports/net/haproxy-devel/work/haproxy-1.8-rc4/src/signal.c:74: 
undefined reference to `__unsupported_argument_size_for_pl_try_s__'

src/fd.o: In function `fd_process_cached_events':
/usr/ports/net/haproxy-devel/work/haproxy-1.8-rc4/src/fd.c:248: 
undefined reference to `__unsupported_argument_size_for_pl_try_s__'

cc: error: linker command failed with exit code 1 (use -v to see invocation)

Removing line 119 & 120 from plock.h makes the build succeed.. But i am 
not sure what gets broken by doing so if anything?..


With those lines removed i get the result below from haproxy -vv, looks 
good :) but i didn't actually start it yet with a proper config.


Regards,

PiBa-NL / Pieter

root@:/usr/ports/net/haproxy-devel # haproxy -vv
HA-Proxy version 1.8-rc4-cfe1466 2017/11/19
Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -pipe -g -fstack-protector -fno-strict-aliasing 
-fno-strict-aliasing -Wdeclaration-after-statement -fwrapv 
-Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label 
-DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 
USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 
USE_PCRE_JIT=1


Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support.
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), 
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")

Built with network namespace support.
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
 kqueue : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
    [TRACE] trace
    [COMP] compression
    [SPOE] spoe

root@:/usr/ports/net/haproxy-devel #




Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-10 Thread PiBa-NL

Hi Willy,

Op 9-11-2017 om 5:45 schreef Willy Tarreau:

Hi Pieter,

We had something similar on Linux in relation with TCP splicing and the
fd cache, for which a fix was emitted. But yesterday Christopher explained
me that the fix has an impact on the way applets are scheduled in 1.8, so
actually it could mean that the initial bug might possibly cover a larger
scope than splicing only, and that recv+send might also be affected. If
you're interested in testing, the commit in 1.7 is
c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and
is present in the latest snapshot (we really need to emit 1.7.10 BTW).

I'd be curious to know if it fixes it or not. At least it will tell us
if that's related to this fd cache thing or to something completely
different such as Lua.

I also need to check with Thierry if we could find a way to add some
stats about the time spent in Lua to "show info" to help debugging such
cases where Lua is involved.

By the way, thanks for your dump, we'll check the sessions' statuses.
There are not that many, and maybe it will give us a useful indication!

Cheers,
Willy


Okay have been running with haproxy-ss-20171017 for a day now. Sofar it 
sticks to <1% cpu usage.


Will report if anything changes, cant tell for sure as don't have a 
clear reproduction of the issue, but my issue 'seems' fixed.


Regards,

PiBa-NL / Pieter




Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-09 Thread PiBa-NL

Hi Willy, List,

Is it correct that when i build a haproxy-ss-20171017 snapshot that the 
version still shows up as:

"HAProxy version 1.7.9, released 2017/08/18"
on both haproxy -vv and stats page.?

Or did i do it wrong?

p.s. I changed the Makefile like this:
PORTNAME=    haproxy-ss
PORTVERSION=    20171017
CATEGORIES=    net www
MASTER_SITES=    http://www.haproxy.org/download/1.7/src/snapshot/

And then ran:
    make clean build install NO_CHECKSUM=yes

Which did 'seem' to download the 'intended?' file..

Thanks,
PiBa-NL / Pieter



Re: HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-09 Thread PiBa-NL

Hi Willy,

Op 9-11-2017 om 5:45 schreef Willy Tarreau:

Hi Pieter,

On Thu, Nov 09, 2017 at 02:28:46AM +0100, PiBa-NL wrote:

Actually haproxy has been running for a few weeks with 100% and i didnt
notice.. it does keep working it seems..

Anyhow thought i would try and capture the next event if it would happen
again. It did after a few hours..

After the truss output below the last line keeps repeating fast lots and
lots of times.

kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 1
(0x1)
recvfrom(1,0x8024ed972,16290,0,NULL,0x0)     = 0 (0x0)
kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)

We had something similar on Linux in relation with TCP splicing and the
fd cache, for which a fix was emitted. But yesterday Christopher explained
me that the fix has an impact on the way applets are scheduled in 1.8, so
actually it could mean that the initial bug might possibly cover a larger
scope than splicing only, and that recv+send might also be affected. If
you're interested in testing, the commit in 1.7 is
c040c1f ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") and
is present in the latest snapshot (we really need to emit 1.7.10 BTW).

I'd be curious to know if it fixes it or not. At least it will tell us
if that's related to this fd cache thing or to something completely
different such as Lua.

I also need to check with Thierry if we could find a way to add some
stats about the time spent in Lua to "show info" to help debugging such
cases where Lua is involved.

By the way, thanks for your dump, we'll check the sessions' statuses.
There are not that many, and maybe it will give us a useful indication!

Cheers,
Willy


Thanks for your time, i didn't think the 'splice' problem mentioned on 
mailing-list would be relevant for my case so i didn't see a need to try 
latest snapshot. Couldn't find much other recent cpu issues there. But 
ill try and compile haproxy 1.7 latest snapshot or perhaps just 1.7.9 
with this extra patch and see if it keeps running with low cpu usage for 
a few days.. I have not compiled haproxy for a while, ill see what works 
the easiest for me, assuming can make it work build on a separate 
FreeBSD machine and packaged/copied to the actual 'problem machine' that 
doesn't have compilation tools on it.. hopefully my build binary will be 
'compatible'..


Will report back in a few day's..

Thanks,
PiBa-NL / Pieter




HAProxy 1.7.9 FreeBSD 100% CPU usage

2017-11-08 Thread PiBa-NL

Hi List,

I've experienced a issue where its using 100% cpu usage with haproxy 
1.7.9 on FreeBSD 11.1p3 / pfSense 2.4.2dev.


There is very little traffic actually hitting this haproxy instance. But 
it happened for the second time in a few days now.
Actually haproxy has been running for a few weeks with 100% and i didnt 
notice.. it does keep working it seems..


Anyhow thought i would try and capture the next event if it would happen 
again. It did after a few hours..


After the truss output below the last line keeps repeating fast lots and 
lots of times.


kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 1.0 })         = 0 (0x0)
kevent(0,0x0,0,{ 1,EVFILT_READ,EV_EOF,0x0,0x0,0x0 },7,{ 0.99400 }) = 
1 (0x1)

recvfrom(1,0x8024ed972,16290,0,NULL,0x0)     = 0 (0x0)
kevent(0,{ 1,EVFILT_READ,EV_DELETE,0x0,0x0,0x0 },1,0x0,0,0x0) = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)
kevent(0,0x0,0,{ },7,{ 0.0 })         = 0 (0x0)

I tried to gather all possible relevant info in attached file. Not using 
much special configuration options.. but i am using lua to server a 
small simple static response.. I'm not sure if its a problem that might 
be related to LUA, or perhaps there is some other issue.?.
I've got tcpdump and complete truss output from before and while it 
happened after a few hours, but actually just a few request (+- 29).. 
But i would prefer to send these off list though, Willy if you desire i 
send em to your mail address? But maybe i have overlooked it on the 
mailinglist and its a known issue already..? Last connection which i 
think caused/triggered the issue is in screenshot(if it attaches right 
to the mail..)basically just a GET request which gets a ack, followed by 
a FYN,ACK packet from the client 30 seconds later again followed by a ack..


The LetsEncrypt backend that is part of the configuration never got a 
single request according to stats..


Is it a known issue?
Are tcpdump/truss output desired ..? (where should i send em?)
Is there any other output that can try to gather next time?

Regards,
PiBa-NL

HA-Proxy version 1.7.9 2017/08/18
  TARGET  = freebsd

[2.4.2-DEVELOPMENT][admin@pfsense.local]/root: 
/usr/local/pkg/haproxy/haproxy_socket.sh show sess all
show sess all 0x80242b800: [08/Nov/2017:19:40:18.868158] id=15 
proto=tcpv4 source=45.76.a.b:53752

  flags=0x48a, conn_retries=0, srv_conn=0x0, pend_pos=0x0
  frontend=www (id=3 mode=http), listener=37.97.x.y:80 (id=1) 
addr=37.97.x.y:80

  backend= (id=-1 mode=-)
  server= (id=-1)
  task=0x80248f380 (state=0x04 nice=0 calls=4 exp= age=4h23m)
  txn=0x802421800 flags=0x820 meth=1 status=-1 req.st=MSG_BODY 
rsp.st=MSG_RPBEFORE waiting=0
  si[0]=0x80242ba38 (state=EST flags=0x08 endp0=CONN:0x8024ca480 
exp=, et=0x000)
  si[1]=0x80242ba60 (state=EST flags=0x4010 endp1=APPCTX:0x8024ca600 
exp=, et=0x000)

  co0=0x8024ca480 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x8024ca300
  flags=0x0025b300 fd=1 fd.state=22 fd.cache=0 updt=0
  app1=0x8024ca600 st0=0 st1=0 st2=0 applet=
  req=0x80242b810 (f=0x80c020 an=0x0 pipe=0 tofwd=-1 total=94)
  an_exp= rex= wex=
  buf=0x8024ed900 data=0x8024ed914 o=94 p=94 req.next=94 i=0 size=16384
  res=0x80242b850 (f=0x8040 an=0xa0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x783160 data=0x783174 o=0 p=0 rsp.next=0 i=0 size=0
0x80242ac00: [09/Nov/2017:00:04:24.403636] id=31 proto=unix_stream 
source=unix:1

  flags=0x88, conn_retries=0, srv_conn=0x0, pend_pos=0x0
  frontend=GLOBAL (id=0 mode=tcp), listener=? (id=1) addr=unix:1
  backend= (id=-1 mode=-)
  server= (id=-1)
  task=0x80248f4d0 (state=0x0a nice=-64 calls=1 exp=10s age=?)
  si[0]=0x80242ae38 (state=EST flags=0x08 endp0=CONN:0x8024ca900 
exp=, et=0x000)
  si[1]=0x80242ae60 (state=EST flags=0x4018 endp1=APPCTX:0x8024ca780 
exp=, et=0x000)
  co0=0x8024ca900 ctrl=unix_stream xprt=RAW data=STRM 
target=LISTENER:0x8024ca000

  flags=0x0020b306 fd=2 fd.state=25 fd.cache=0 updt=0
  app1=0x8024ca780 st0=7 st1=0 st2=3 applet=
  req=0x80242ac10 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=15)
  an_exp= rex=10s wex=
  buf=0x8024e7dc0 data=0x8024e7dd4 o=0 p=0 req.next=0 i=0 size=16384
  res=0x80242ac50 (f=0x80008002 an=0x0 pipe=0 tofwd=-1 total=1198)
  an_exp= rex= wex=
  buf=0x8025603c0 data=0x8025603d4 o=1198 p=1198 rsp.next=0 i=0 
size=16384


FreeBSD pfsense.local 11.1-RELEASE-p3 FreeBSD 11.1-RELEASE-p3 #362 
r313908+9cf44ec5484(RELENG_2_4): Fri Nov  3 08:23:14 CDT 2017

[2.4.2-DEVELOPMENT][admin@pfsense.local]/root: haproxy -vv
HA-Proxy version 1.7.9 2017/08/18
Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU = generic
  CC  = cc
  CFLAGS  = -O2

Re: confusion regarding usage of haproxy for large number of connections

2017-10-27 Thread PiBa-NL

Hi,
Op 27-10-2017 om 14:58 schreef kushal bhattacharya:

Hi,
I am confused regarding the readme text ' This is a development 
version, so it is expected to break from time to time,
to add and remove features without prior notification and it should 
not be used
in production' .Here I am testing for 8000 connections being 
distributed to three virtual mqtt brokers having same ip address but 
three different ports.I am getting a maximum threshold of 2000 
connections being handled in this setup.Haproxy is listeneing to a 
port for incoming client connections and distributing it to the 3 mqtt 
brokers with the configuration file  given below



defaults
    mode tcp
    maxconn  8000
    timeout connect    5000s
    timeout client 5000s
    timeout server 5000s

frontend localnodes
    bind *:9875
    log global
    log 127.0.0.1:514 <http://127.0.0.1:514> local0 info
    option tcplog

    default_backend nodes


backend nodes
    mode tcp
    balance roundrobin
    server web01 192.168.0.5:9878 <http://192.168.0.5:9878>  maxconn 3000
    server web02 192.168.0.5:9877 <http://192.168.0.5:9877>  maxconn 3000
    server web03 192.168.0.5:9876 <http://192.168.0.5:9876>  maxconn 2000

With this configuration can i undergo my setup with 8000 connection 
load distribution or do i have to undergo some changes here

Thanks,
Kushal


Add a 'maxconn 8000' in 'global' section?

Regards,

PiBa-NL



Re: Need to understand logs

2017-09-11 Thread PiBa-NL

Hi Rajesh, Aleksander,
Op 11-9-2017 om 10:32 schreef Rajesh Kolli:

Hi Aleksandar,

Thank you for clarifying about "Layer 4" checks.

I am interested in knowing the values of these %d/%d, %s in line 319. Why it
is taking only 1/2, 1/3... values? What they are representing?

Have you seen rise & fall in the documentation?
http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#rise
http://cbonte.github.io/haproxy-dconv/1.8/snapshot/configuration.html#5.2-fall
Basically it takes by default 3 consecutive failed checks to mark a 
server down, and 2 passed checks to get it back up.

So 1/3 is 1 failed check but server status is still 'up'.
Then 2/3 failed check, but still marked up.
At 3/3 the server would be marked down, and removed from the backend pool.
Then after a while when the webserver is working again the following 
will happen.

After the first successful 1/2 check the server is still marked 'down'.
And on the second 2/2 successful check it will be marked 'up' and is 
added back into the backend server pool to take requests.

  319   "chunk_appendf(, ", status: %d/%d %s",
  320  (check->health >= check->rise) ?
check->health - check->rise + 1 : check->health,
  321  (check->health >= check->rise) ?
check->fall : check->rise,
  322  (check->health >= check->rise) ?
(s->uweight ? "UP" : "DRAIN") : "DOWN");
  323

Thanks & Regards
Rajesh Kolli

-Original Message-
From: Aleksandar Lazic [mailto:al-hapr...@none.at]
Sent: Sunday, September 10, 2017 9:37 PM
To: Rajesh Kolli; haproxy@formilux.org
Subject: Re: Need to understand logs

Hi Rajesh.

Rajesh Kolli wrote on 08.09.2017:


Hi Aleksandar,
Thank you for your response. Yes, I am using "Log-health-checks" in my
configuration and here is my HAProxy version information.

Thanks.

sorry to say that but for know you can only take a look into the source for
documenation.

http://git.haproxy.org/?p=haproxy-1.7.git=search=HEAD=grep=PR_O2_LO
GHCHKS

for example.

http://git.haproxy.org/?p=haproxy-1.7.git;a=blob;f=src/checks.c;hb=640d526f8
cdad00f7f5043b51f6a34f3f6ebb49f#l307

We are open for patches also for documentation to add this part to the docs
;-)

To answer your question below I think layer 4 checks are 'only' tcp checks
which sometimes are answered by some os when a service is listen on the
specific port.

This does not means that the App works properly.

I'm open for any correction when my assumption is wrong.

Regards
Aleks


[root@DS-11-82-R7-CLST-Node1 ~]# haproxy -vv HA-Proxy version 1.7.8
2017/07/07 Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>
Build options :
   TARGET  = linux2628
   CPU = generic
   CC  = gcc
   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement

-fwrapv

   OPTIONS =
Default settings :
   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
200



Thanks & Regards
Rajesh Kolli
-Original Message-
From: Aleksandar Lazic [mailto:al-hapr...@none.at]
Sent: Thursday, September 07, 2017 10:08 PM
To: Rajesh Kolli; haproxy@formilux.org
Subject: Re: Need to understand logs
Hi Rajesh.
Rajesh Kolli wrote on 07.09.2017:

Hello,

I am using HAProxy community version from a month, i need to
understand logs of HAProxy for the i need your help.

Here is a sample of my logs:
Sep  6 17:03:31 localhost haproxy[19389]: Health check for server
Netrovert-sites/DS-11-81-R7-CLST-Node2 succeeded, reason: Layer4
check passed, check duration: 0ms, status: 1/2 DOWN.
Sep  6 17:03:33 localhost haproxy[19389]: Health check for server
Netrovert-sites/DS-11-81-R7-CLST-Node2 succeeded, reason: Layer4
check passed, check duration: 0ms, status: 3/3 UP.
Sep  6 17:03:33 localhost haproxy[19389]: Server
Netrovert-sites/DS-11-81-R7-CLST-Node2 is UP. 2 active and 0 backup
servers online. 0 sessions requeued, 0 total in queue.

Here my doubts are, in first line health check is 1/2 DOWN and 2nd
line it is 3/3 UP, in both cases Layer4 check passed. How to
understand it? what exactly it is checking? what are these 1/2 & 1/3's?

Finally, is there any document to understand its logging?

There is a logging part in the doc but I haven't seen such entries in the

document.


http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#8
Maybe you have activated
http://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-optio
n%20log-health-checks
in your config.



It would be nice to know which haproxy version you use.
haproxy -vv
--
Best Regards
Aleks
https://www.me2digital.com/




--
Best Regards
Aleks



Regards,

PiBa-NL




Re: ssl & default_backend

2017-04-03 Thread PiBa-NL

Hi Antonio,

Op 3-4-2017 om 13:29 schreef Antonio Trujillo Carmona:

It's well documented that Windows XP with Internet Explorer don't
support sni, so I try to redirect call through "default_backend", but I
got ERROR-404, it work fine with all other combination of  OS/surfer.
If I (only for test purpose) comment the four line with "ssiiprovincial"
(witch mean all the traffic must be redirected through default_backend)
it don't work with any OS/surfer.



frontend Aplicaciones
 bind *:443
 mode tcp
 log global
 tcp-request inspect-delay 5s
 tcp-request content accept if { req_ssl_hello_type 1 }

 # Parametros para utilizar SNI (Server Name Indication)
 acl aplicaciones req_ssl_sni -i aplicaciones.gra.sas.junta-andalucia.es
 acl citrixsf req_ssl_sni -i ssiiprovincial.gra.sas.junta-andalucia.es
 acl citrixsf req_ssl_sni -i ssiiprovincial01.gra.sas.junta-andalucia.es
 acl citrixsf req_ssl_sni -i ssiiprovincial.hvn.sas.junta-andalucia.es
 acl citrixsf req_ssl_sni -i ssiiprovincial01.hvn.sas.junta-andalucia.es

 use_backend CitrixSF-SSL if citrixsf
 use_backend SevidoresWeblogic-12c-Balanceador-SSL
There is no acl for the backend above? so probably the default_backend 
below will never be reached.

Could it be the above backend returns the 404 your seeing?

 default_backend CitrixSF-SSL


Regards,

PiBa-NL




Re: Conditionally terminating SSL based on SNI

2016-09-21 Thread PiBa-NL

Hi Willy, Christopher,

Do you perhaps have a small update about the "[PATCH] MAJOR: ssl: add 
'tcp-fallback' bind option for SSL listeners" ?
I've not seen any new information about it for a while, will it come 
with 1.7devX ? Or should there first be a solid http/2 implementation 
before expecting this feature.?


@Sven,
On the mailinglist above mentioned patch was posted 11march2016 that 
made this kind of feature possible.
I 'think' it should still be possible to apply most of it it against 
current version sources just need to change the two flags x2.


Thanks as always,
Pieter

Op 21-9-2016 om 17:52 schreef Sven Marnach:

Hi,

I'd like to configure haproxy to listen on a single IP address and 
port 443.  Based on the SNI information of the incoming connections, 
I'd like to terminate some of the SSL connections on the proxy and 
send plain HTTP requests to the backend.  For other domain names, 
however, I'd like to operate in TCP mode and simply cut through the 
connection to the backend, wihtout decrypting the traffic.


The only solution I managed to cook up after some experimentation 
involves looping back to haproxy itself:


frontend fe_https_dispatch
bind *:443
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if { req.ssl_hello_type 1 }
use_backend be_lets_encrypt if { req.ssl_sni -m end .acme.invalid }
default_backend be_https_loopback

backend be_lets_encrypt
mode tcp
server srv_lets_encrypt 127.0.0.1:63443 

backend be_https_loopback
mode tcp
server srv_https_loopback 127.0.0.1:36427 

frontend fe_https_loopback
bind *:36427 ssl crt /etc/ssl/certs/ strict-sni
mode http
use_backend be_foo if { req.ssl_sni -i foo.example.com 
 }
use_backend be_bar if { req.ssl_sni -i bar.example.com 
 }


[… backend definitions of be_foo and be_bar …]

This feels like a hack, and I also wonder whether this has performance 
implications, since each request is parsed twice by haproxy.  Is there 
any way to achieve this without looping back to haproxy?


Cheers,
--
Sven
@OpenCraft





<    1   2   3   4   >