Re: httpchk POST adding "Connection: close" after data

2017-12-07 Thread Willy Tarreau
Hi Rahul,

On Fri, Dec 08, 2017 at 12:00:51PM +0530, Rahul Ghanate wrote:
> Hi,
> 
> Here is the post I have added on discourse about the issue with httpchk
> POST request, which you can find at,
> https://discourse.haproxy.org/t/httpchk-post-adding-connection-close-after-data/1875
> The
> content of the post is,
> 
> ===
> 
> I have setup haproxy 1.7.8 with option httpchk for my backend servers and
> have working GET requests.
> But while configuring the POST request with json data, I am getting error
> code 400.
> 
> Here is my check added in backend block,
> 
> option httpchk POST /myService/endpt HTTP/1.1\r\nContent-Type:\
> application/json;charset=UTF-8\r\nContent-Length:\
> 169\r\n\r\n{\"inputs\":[{\"id\":1,\"productType\":\"productType\",\"productDescription\":\"productDescription\",\"metaDescription\":\"metaDescription\",\"metaTitle\":\"metaTitle\",\"rawxyz\":\"rawxyz\"}]}
> http-check expect rstatus (2|3)[0-9][0-9]
> 
> After debugging with wireshark capture I came to know that, haproxy is
> adding *\r\f"Connection: close"\r\f\r\f* at the end of the post json data.
> >From this manual, https://www.haproxy.org/download/1.7/doc/configuration.txt,
> I found that haproxy appends it if httpchk is combined with *http-check
> expect*.
> But it should be added to header fields and not after data.
> 
> This is causing packet parse failure, as it considering POST data as a part
> of header and reporting extra CRLF in headers.
> 
> [image: image]
> image.png765x161 32.2 KB
> 
> 
> I would need the http-check expect block to verify error code, but then how
> would I avoid adding *Connection: close* at the end.

Good diagnostic. I didn't remember we had this. I'm not sure what the best
solution is now. The problem with checks is that there's no notion of
headers or body, it's just a data block that is sent. Maybe we could modify
the code to try to modify the buffer contents in place. I'm seeing that
the send-state header has the same problem a few lines above in the code
by the way.

I'm wondering if instead we shouldn't add a new http-check directive that
would help define the body to send after the request and which would
automatically compute the content-length as a bonus. We could then have
something like this :

  option httpchk POST /myService/endpt HTTP/1.1\r\nContent-Type:\
  application/json;charset=UTF-8
  http-check body 
{\"inputs\":[{\"id\":1,\"productType\":\"productType\",\"productDescription\":\"productDescription\",\"metaDescription\":\"metaDescription\",\"metaTitle\":\"metaTitle\",\"rawxyz\":\"rawxyz\"}]}
  http-check expect rstatus (2|3)[0-9][0-9]

And maybe layer we could imagine having "http-check header" to add some
headers instead of the usual trick with \r\n above. I don't have much time
to estimate the amount of work needed for this given the pending issues on
HTTP/2, but I'm willing to help anyone who wants to give it a try, and even
to try to backport it to 1.7 as a fix.

Willy



Re: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-07 Thread Willy Tarreau
Hi Pieter,

On Thu, Dec 07, 2017 at 11:02:13PM +0100, PiBa-NL wrote:
> Made a new version of it with a bit of extra comments inside the code,
> removed a unrelated white-space change, and added a matching patch
> description.

OK, now applied, thank you!

Willy



Re: 1.8.1-fe66fd doesn't finish startup and doesn't listen to sockets

2017-12-07 Thread Willy Tarreau
Hi Pavlos,

On Thu, Dec 07, 2017 at 10:18:02PM +0100, Pavlos Parissis wrote:
> On 07/12/2017 07:41 uu, Willy Tarreau wrote:
> > It looks like it doesn't finish to startup in fact. Are you seeing it spin
> > on the CPU maybe ?
> 
> Yes, it does. I am sorry but I didn't notice it

no pb.

> pparissis at poseidonas in ~/repo/haproxy-1.8 on (master u=)
> sudo gdb ./haproxy
> GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./haproxy...done.
> (gdb) run -f /etc/haproxy/haproxy-ams4-dc.cfg
> Starting program: /home/pparissis/repo/haproxy-1.8/haproxy -f
> /etc/haproxy/haproxy-ams4-dc.cfg
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [WARNING] 340/221611 (13628) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] 
> : a
> 'http-request' rule placed after a 'use_backend' rule will still be processed 
> before.
> ^C
> Program received signal SIGINT, Interrupt.
> 0x5562e302 in register_name (name=0x558ee294 "selected_dc_backup",
> len=18, scope=scope@entry=0x7fffd83c,
> alloc=alloc@entry=1, err=0x7fffdb28) at src/vars.c:215
> 215   HA_RWLOCK_WRLOCK(VARS_LOCK, _names_rwlock);

Ah, this looks like a double-lock :-)
CCing Christopher who knows better than me how to track them.

> (gdb) bt full
> #0  0x5562e302 in register_name (name=0x558ee294 
> "selected_dc_backup",
> len=18, scope=scope@entry=0x7fffd83c,
> alloc=alloc@entry=1, err=0x7fffdb28) at src/vars.c:215
> i = 
> var_names2 = 
> tmp = 
> res = 0x0
> #1  0x5562f489 in vars_check_arg (arg=0x558ee600, err= out>)
> at src/vars.c:502
> name = 
> scope = SCOPE_REQ
> err = 
> arg = 0x558ee600
> #2  0x55609b94 in sample_parse_expr (str=str@entry=0x558ee598,
> idx=idx@entry=0x7fffd964,
> file=file@entry=0x558df420 "/etc/haproxy/haproxy-ams4-dc.cfg",
> line=line@entry=106, err_msg=err_msg@entry=0x7fffdb28,
> al=al@entry=0x558e85d8) at src/sample.c:901
> begw = 
> endw = 0x558df4c2 "req.selected_dc_backup)"
> endt = 0x558df4d8 ")"
> expr = 0x558ee5c0
> fetch = 0x558ade30 
> conv = 
> prev_type = 6
> fkw = 0x558ee2f0 "var"
> ckw = 0x0
> err_arg = 1
> #3  0x55630144 in parse_acl_expr (args=args@entry=0x558ee598,
> err=err@entry=0x7fffdb28, al=al@entry=0x558e85d8,
> file=file@entry=0x558df420 "/etc/haproxy/haproxy-ams4-dc.cfg", 
> line=106)
> at src/acl.c:351
> expr = 
> aclkw = 0x0
> refflags = 
> patflags = 
> arg = 
> smp = 0x0
> idx = 0
> ckw = 0x0
> begw = 
> endw = 
> endt = 
> cur_type = 
> nbargs = 
> operator = 2
> op = 
> contain_colon = 
> have_dot = 
> dot = 
> value = 3
> minor = 40
> buffer = "\000\000\000\000\000\000\000\000\003\000\000\000\060", 
> '\000'
> , "[\000\000\000n\000\000\000\000"
> is_loaded = 
> unique_id = 
> error = 0x50 
> ref = 
> pattern_expr = 
> load_as_map = 0
> acl_conv_found = 0
> #4  0x55630e50 in parse_acl (args=args@entry=0x558ee590,
> known_acl=known_acl@entry=0x558e6de8,
> err=err@entry=0x7fffdb28, al=al@entry=0x558e85d8,
> file=file@entry=0x558df420 "/etc/haproxy/haproxy-ams4-dc.cfg",
> line=line@entry=106) at src/acl.c:725
> cur_acl = 
> acl_expr = 
> name = 
> pos = 
> #5  0x55631293 in parse_acl_cond (args=0x7fffdc68,
> known_acl=0x558e6de8, pol=,
> err=err@entry=0x7fffdb28, al=al@entry=0x558e85d8,
> file=file@entry=0x558df420 "/etc/haproxy/haproxy-ams4-dc.cfg",
> line=106) at src/acl.c:976
> arg_end = 6
> args_new = 0x558ee590
> arg = 
> neg = 0
> word = 
> cur_acl = 
> cur_term = 
> cur_suite = 0x558ee450
> cond = 0x558ee400
> suite_val = 524287
> #6  0x55631611 in build_acl_cond 

Re: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-07 Thread PiBa-NL

Hi Christopher, Willy,

Op 7-12-2017 om 19:33 schreef Willy Tarreau:

On Thu, Dec 07, 2017 at 04:27:16PM +0100, Christopher Faulet wrote:

Honestly, I don't know which version is the best.

Just let me know guys :-)
imho Christopher's patch is smaller and probably easier to maintain and 
eventually remove without adding (unneeded) code to the 
set_server_check_status(). Though it is a bit less obvious to me that it 
will have the same effect, i works just as well.



Email alerts should
probably be rewritten to not use the checks. This was the only solution to
do connections by hand when Simon implemented it. That's not true anymore.

I agree and I think I was the one asking Simon to do it like this by then
eventhough he didn't like this solution. That was an acceptable tradeoff
in my opinion, with very limited impact on existing code. Now with applets
being much more flexible we could easily reimplement a more complete and
robust SMTP engine not relying on hijacking the tcp-check engine anymore.

Willy


A 'smtp engine' for sending email-alert's might be nice eventually but 
that is not easily done 'today'. (not by me anyhow) (Would it group 
messages together if multiple are created within a short time-span?)


As for the current issue / patch, i prefer the solution Christopher 
found/made.


Made a new version of it with a bit of extra comments inside the code, 
removed a unrelated white-space change, and added a matching patch 
description.
Or perhaps Christopher can create it under his own name? Either way is 
fine for me. :)


Regards,
PiBa-NL / Pieter

From 3129e1ae21e41a026f6d067b3658f6643835974c Mon Sep 17 00:00:00 2001
From: PiBa-NL 
Date: Wed, 6 Dec 2017 01:35:43 +0100
Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
 email-alert task

This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids 
sending lots of emails when 'option log-health-checks' is used. It is avoided 
to change the server state and possibly queue a new email while
processing the email alert by setting check->status to HCHK_STATUS_UNKNOWN 
which will exit the set_server_check_status(..) early.

This needs to be backported to 1.8.
---
 src/checks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index eaf84a2..3a6f020 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -3145,7 +3145,7 @@ static struct task *process_email_alert(struct task *t)
t->expire = now_ms;
check->server = alert->srv;
check->tcpcheck_rules = >tcpcheck_rules;
-   check->status = HCHK_STATUS_INI;
+   check->status = HCHK_STATUS_UNKNOWN; // the 
UNKNOWN status is used to exit set_server_check_status(.) early
check->state |= CHK_ST_ENABLED;
}
 
-- 
2.10.1.windows.1



Re: 1.8.1-fe66fd doesn't finish startup and doesn't listen to sockets

2017-12-07 Thread Pavlos Parissis
On 07/12/2017 10:18 μμ, Pavlos Parissis wrote:
> On 07/12/2017 07:41 μμ, Willy Tarreau wrote:
>> Hi Pavlos!
>>
>> On Thu, Dec 07, 2017 at 07:16:54PM +0100, Pavlos Parissis wrote:
>>> Hi,
>>>
>>> OK, I haven't read the ML for ~2 weeks and a quick scan didn't reveal 
>>> anything.
>>> So, here I am asking something that may have been addressed already.
>>>
>>> Today, I decided to switch my dev env to haproxy-1.8 using current master 
>>> and I
>>> started haproxy in the same way as I have been doing with older releases:
>>>
>>> sudo ./haproxy -f /etc/haproxy/haproxy-ams4-dc.cfg
>>> [WARNING] 340/173007 (3104) : parsing 
>>> [/etc/haproxy/haproxy-ams4-dc.cfg:103] : a
>>> 'http-request' rule placed after a 'use_backend' rule will still be 
>>> processed before.
>>>
>>> above it didn't return and wasn't printing, expect the warning. I curled 
>>> against
>>> the IPs and got back connection error, see attached file for process 
>>> output, lsof
>>> info, build verion and haproxy.cfg.
>>>
>>> I also started in the way it is mentioned in section 3 of management 
>>> document:
>>> sudo ./haproxy  -f /etc/haproxy/haproxy-ams4-dc.cfg -D -p 
>>> /run/haproxy-ams4.pid
>>> -sf $(cat /run/haproxy-ams4.pid)
>>> cat: /run/haproxy-ams4.pid: No such file or directory
>>> [WARNING] 340/173007 (3104) : parsing 
>>> [/etc/haproxy/haproxy-ams4-dc.cfg:103] : a
>>> 'http-request' rule placed after a 'use_backend' rule will still be 
>>> processed before.
>>>
>>> but same result, haproxy didn't return and I had to CTRL-C it.
>>>
>>> I am pretty sure I am doing something stupid but I can't find it.
>>>
>>> Any ideas?
>>
>> It looks like it doesn't finish to startup in fact. Are you seeing it spin
>> on the CPU maybe ?
> 
> Yes, it does. I am sorry but I didn't notice it
> 


Found the issue. If I remove all set-var and var(req.) statements from the
config then haproxy loads.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: Segfault with 1.8.0 build (RHEL5, old gcc).

2017-12-07 Thread Christopher Lane
On Thu, Dec 7, 2017 at 4:36 AM, Olivier Houchard  wrote:
> Hi Christopher,
>
> On Wed, Dec 06, 2017 at 05:34:15PM -0800, Christopher Lane wrote:
>> On Mon, Dec 4, 2017 at 11:56 AM, Christopher Lane
>>  wrote:
>> >
>> >
>> >
>> > On Mon, Dec 4, 2017 at 4:22 AM Lukas Tribus  wrote:
>> >
>> >>Hello Christopher,
>> >
>> >
>> >>2017-12-01 20:59 GMT+01:00 Christopher Lane :
>> >>>
>> >>> gist with backtrace, -vv output, and config file. Also strace.
>> >>>
>> >>> https://gist.github.com/jayalane/c6dbe7918aa9635b62c874d20f57dfec
>> >>>
>> >>> It does all the listens and then right after the first epoll is done it
>> >>> has this segv. all the local variables are "optimized out"
>> >>>
>> >>> (We really want the hard-stop-after -- we get a leak of children with our
>> >>> super frequent soft-reloads).
>> >
>> >>FYI, hard-stop-after has been backported to 1.7 stable and is in all
>> >>rebuilds starting with 1.7.4:
>> >
>> >> https://www.mail-archive.com/haproxy@formilux.org/msg25494.html
>> >
>> >
>> >
>> >> Regards,
>> >> Lukas
>> >
>> > Olivier:
>> >
>> > The patch worked beautifully to keep the 1.8.0 from crashing.
>> >
>> > Lukas:
>> >
>> > Thanks for the tip, we'll consider 1.7.9 then (settled on 1.8.0 by starting
>> > out with reading the release notes for it; we are upgrading from 1.5.0).
>> >
>> > --Chris
>>
>> Unrelated to the prior contents of this thread, I found the root cause
>> for our issue (child leak).
>>
>> The haproxy was being started from a Java process using Runtime.exec()
>> and the PIDs were jammed into 1 cell of argv.  It killed the first
>> child but not the later ones, as atol("13233 13234 13235 13236")
>> returns 13233.  We have fixed the Java code.
>>
>> http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/haproxy.c;h=eb5e65b40e7b8b2a4f8fb04b3552401e42fb0a89;hb=HEAD#l1421
>>
>> I note that the -sf parsing code uses atol and has no error checking.
>> Would the project be interested in a patch to use strtol with error
>> checks?  Could log a warning if unconsumed bytes in the arg (safer),
>> or fail to start (unsafe).  I'm sure I'm not the only one with this
>> sort of bug, given how tricky shell escaping and so on is.
>
> Yes, replacing ato* with something more reasonnable would certainly be
> appreciated :)
>
> Regards,
>
> Olivier

Didn't really see a place to put in tests for it, but did build git
master and try it with various correct and incorrect args.  Patch is
under subject:

[PATCH] CONTRIB: log: emit warning when -sf/-sd cannot parse argument


Thanks,


--Chris



Re: 1.8.1-fe66fd doesn't finish startup and doesn't listen to sockets

2017-12-07 Thread Pavlos Parissis
On 07/12/2017 07:41 μμ, Willy Tarreau wrote:
> Hi Pavlos!
> 
> On Thu, Dec 07, 2017 at 07:16:54PM +0100, Pavlos Parissis wrote:
>> Hi,
>>
>> OK, I haven't read the ML for ~2 weeks and a quick scan didn't reveal 
>> anything.
>> So, here I am asking something that may have been addressed already.
>>
>> Today, I decided to switch my dev env to haproxy-1.8 using current master 
>> and I
>> started haproxy in the same way as I have been doing with older releases:
>>
>> sudo ./haproxy -f /etc/haproxy/haproxy-ams4-dc.cfg
>> [WARNING] 340/173007 (3104) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] 
>> : a
>> 'http-request' rule placed after a 'use_backend' rule will still be 
>> processed before.
>>
>> above it didn't return and wasn't printing, expect the warning. I curled 
>> against
>> the IPs and got back connection error, see attached file for process output, 
>> lsof
>> info, build verion and haproxy.cfg.
>>
>> I also started in the way it is mentioned in section 3 of management 
>> document:
>> sudo ./haproxy  -f /etc/haproxy/haproxy-ams4-dc.cfg -D -p 
>> /run/haproxy-ams4.pid
>> -sf $(cat /run/haproxy-ams4.pid)
>> cat: /run/haproxy-ams4.pid: No such file or directory
>> [WARNING] 340/173007 (3104) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] 
>> : a
>> 'http-request' rule placed after a 'use_backend' rule will still be 
>> processed before.
>>
>> but same result, haproxy didn't return and I had to CTRL-C it.
>>
>> I am pretty sure I am doing something stupid but I can't find it.
>>
>> Any ideas?
> 
> It looks like it doesn't finish to startup in fact. Are you seeing it spin
> on the CPU maybe ?

Yes, it does. I am sorry but I didn't notice it

> Otherwise probably that starting it by hand in gdb and
> stopping it to see what it's doing will help.
>


pparissis at poseidonas in ~/repo/haproxy-1.8 on (master u=)
sudo gdb ./haproxy
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./haproxy...done.
(gdb) run -f /etc/haproxy/haproxy-ams4-dc.cfg
Starting program: /home/pparissis/repo/haproxy-1.8/haproxy -f
/etc/haproxy/haproxy-ams4-dc.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[WARNING] 340/221611 (13628) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] : 
a
'http-request' rule placed after a 'use_backend' rule will still be processed 
before.
^C
Program received signal SIGINT, Interrupt.
0x5562e302 in register_name (name=0x558ee294 "selected_dc_backup",
len=18, scope=scope@entry=0x7fffd83c,
alloc=alloc@entry=1, err=0x7fffdb28) at src/vars.c:215
215 HA_RWLOCK_WRLOCK(VARS_LOCK, _names_rwlock);
(gdb) bt full
#0  0x5562e302 in register_name (name=0x558ee294 
"selected_dc_backup",
len=18, scope=scope@entry=0x7fffd83c,
alloc=alloc@entry=1, err=0x7fffdb28) at src/vars.c:215
i = 
var_names2 = 
tmp = 
res = 0x0
#1  0x5562f489 in vars_check_arg (arg=0x558ee600, err=)
at src/vars.c:502
name = 
scope = SCOPE_REQ
err = 
arg = 0x558ee600
#2  0x55609b94 in sample_parse_expr (str=str@entry=0x558ee598,
idx=idx@entry=0x7fffd964,
file=file@entry=0x558df420 "/etc/haproxy/haproxy-ams4-dc.cfg",
line=line@entry=106, err_msg=err_msg@entry=0x7fffdb28,
al=al@entry=0x558e85d8) at src/sample.c:901
begw = 
endw = 0x558df4c2 "req.selected_dc_backup)"
endt = 0x558df4d8 ")"
expr = 0x558ee5c0
fetch = 0x558ade30 
conv = 
prev_type = 6
fkw = 0x558ee2f0 "var"
ckw = 0x0
err_arg = 1
#3  0x55630144 in parse_acl_expr (args=args@entry=0x558ee598,
err=err@entry=0x7fffdb28, al=al@entry=0x558e85d8,
file=file@entry=0x558df420 "/etc/haproxy/haproxy-ams4-dc.cfg", line=106)
at src/acl.c:351
expr = 
aclkw = 0x0
refflags = 
patflags = 
arg = 
smp = 0x0
idx = 0
ckw = 0x0
begw = 
endw = 
endt = 
cur_type = 
nbargs = 
operator = 2
op = 
contain_colon = 
have_dot = 
dot = 
value = 3
minor = 40
 

[PATCH] CONTRIB: log: emit warning when -sf/-sd cannot parse argument

2017-12-07 Thread Chris Lane

Previously, -sf and -sd command line parsing used atol which cannot
detect errors.  I had a problem where I was doing -sf "$pid1 $pid2 $pid"
and it was sending the gracefully terminate signal only to the first pid.
The change uses strtol and checks endptr and errno to see if the parsing
worked.  It doesn't exit so as not to cause failures but will allow 
trouble-shooting
to be faster.
---
 src/haproxy.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index eb5e65b..3185a2e 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -1412,13 +1412,24 @@ static void init(int argc, char **argv)
else
oldpids_sig = SIGTERM; /* terminate 
immediately */
while (argc > 1 && argv[1][0] != '-') {
+   char * endptr = NULL;
oldpids = realloc(oldpids, (nb_oldpids 
+ 1) * sizeof(int));
if (!oldpids) {
ha_alert("Cannot allocate old 
pid : out of memory.\n");
exit(1);
}
argc--; argv++;
-   oldpids[nb_oldpids] = atol(*argv);
+   errno = 0;
+   oldpids[nb_oldpids] = strtol(*argv, 
, 10);
+   if (errno) {
+   ha_alert("-%2s option: failed 
to parse {%s}: %s\n",
+(char *)*argv, 
strerror(errno));
+   } else if (endptr && strlen(endptr)) {
+   while (isspace(*endptr)) 
endptr++;
+   if (*endptr != 0) 
+   ha_alert("-%2s option: 
some bytes unconsumed in PID list {%s}\n",
+flag, endptr);
+   }
if (oldpids[nb_oldpids] <= 0)
usage(progname);
nb_oldpids++;
-- 
2.1.1




Quick update on pending HTTP/2 issues

2017-12-07 Thread Willy Tarreau
Guys,

just to warn you, there's currently an issue affecting HTTP/2 with POST
payloads to "slow" servers. It's a bit difficult to explain but in short
if haproxy's buffers fill up during the transfer, there's a risk that the
wakeup event to restart decoding once the buffer is released cannot be
processed immediately, and is not detected later, causing the end of data
never to be sent and the request to time out.

I now manage to reproduce it reasonably easily with curl. As long as there
is some other activity on the connection, things will continue to flow and
there will not be any problem but in this particular case we end up with a
black hole trying to process something we can't do.

This one is a bit annoying because after turning it into any direction,
it seems the only way to address it is to revisit the now very old stream
interface API, and given that each and every timeout bug we faced during
1.7 has some of its roots in there, I'd rather be extremely cautious
before trying to be creative.

I'm still working on this as time permits. In the mean time I'd suggest
that those having to support "large" POSTs (those that don't fit in
haproxy's + kernel's buffers, ie more than a few hundreds of kB) better
disable HTTP/2 for now.

I'll keep you updated once an acceptable solution is found, and now it's
getting pretty clear that we need to kill all this aging mess for 1.9!

Cheers,
Willy



Re: [PATCH] BUG: NetScaler CIP handling is incorrect

2017-12-07 Thread Willy Tarreau
Hi Andreas,

On Thu, Dec 07, 2017 at 02:41:29PM +0100, Andreas Mahnke wrote:
> Hello everybody,
> 
> Is there an update regarding the merging of the patch? We are thinking to
> switch to version 1.8 in the near future and it would be nice to have the
> patch merged, so that we do not need to merge each update on our own.

It seems nobody involved had any time to get back on this unfortunately.

Bertrand, just as a quick refresh, Andreas proposed a patch in this
thread changing the way you parse the NS CIP. If you're using it and
it works for you, then I suspect there might be several incompatible
versions in the wild and the fix could break other use cases. Otherwise
maybe it only affects a part you don't rely on. So I just need your
ACK/NACK on his patch (the first one in this thread started in July).

Thanks!
Willy



Re: 1.8.1-fe66fd doesn't finish startup and doesn't listen to sockets

2017-12-07 Thread Willy Tarreau
Hi Pavlos!

On Thu, Dec 07, 2017 at 07:16:54PM +0100, Pavlos Parissis wrote:
> Hi,
> 
> OK, I haven't read the ML for ~2 weeks and a quick scan didn't reveal 
> anything.
> So, here I am asking something that may have been addressed already.
> 
> Today, I decided to switch my dev env to haproxy-1.8 using current master and 
> I
> started haproxy in the same way as I have been doing with older releases:
> 
> sudo ./haproxy -f /etc/haproxy/haproxy-ams4-dc.cfg
> [WARNING] 340/173007 (3104) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] 
> : a
> 'http-request' rule placed after a 'use_backend' rule will still be processed 
> before.
> 
> above it didn't return and wasn't printing, expect the warning. I curled 
> against
> the IPs and got back connection error, see attached file for process output, 
> lsof
> info, build verion and haproxy.cfg.
> 
> I also started in the way it is mentioned in section 3 of management document:
> sudo ./haproxy  -f /etc/haproxy/haproxy-ams4-dc.cfg -D -p 
> /run/haproxy-ams4.pid
> -sf $(cat /run/haproxy-ams4.pid)
> cat: /run/haproxy-ams4.pid: No such file or directory
> [WARNING] 340/173007 (3104) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] 
> : a
> 'http-request' rule placed after a 'use_backend' rule will still be processed 
> before.
> 
> but same result, haproxy didn't return and I had to CTRL-C it.
> 
> I am pretty sure I am doing something stupid but I can't find it.
> 
> Any ideas?

It looks like it doesn't finish to startup in fact. Are you seeing it spin
on the CPU maybe ? Otherwise probably that starting it by hand in gdb and
stopping it to see what it's doing will help.

Cheers,
Willy



Re: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-07 Thread Willy Tarreau
On Thu, Dec 07, 2017 at 04:27:16PM +0100, Christopher Faulet wrote:
> Honestly, I don't know which version is the best.

Just let me know guys :-)

> Email alerts should
> probably be rewritten to not use the checks. This was the only solution to
> do connections by hand when Simon implemented it. That's not true anymore.

I agree and I think I was the one asking Simon to do it like this by then
eventhough he didn't like this solution. That was an acceptable tradeoff
in my opinion, with very limited impact on existing code. Now with applets
being much more flexible we could easily reimplement a more complete and
robust SMTP engine not relying on hijacking the tcp-check engine anymore.

Willy



Re: MINOR: Add help text for -s switch in halog program

2017-12-07 Thread Willy Tarreau
On Tue, Dec 05, 2017 at 12:46:13AM +, Aleksandar Lazic wrote:
> Hi.
> 
> Attached a small patch for halog against 1.8 repo witch also works against
> 1.9 repo.

Ah thank you Aleks, I've wanted to do it several times in the past
and always forgot! Now applied.

Willy



Re: [PATCH 1/2] DOC: mworker: Update messages referencing exit-on-failure

2017-12-07 Thread Willy Tarreau
Hi guys!

On Tue, Dec 05, 2017 at 06:16:22PM +0100, Tim Düsterhus wrote:
> William,
> 
> Am 04.12.2017 um 16:09 schrieb William Lallemand:
> > I prefer to leave this one, because otherwise the user won't understand why 
> > it
> > killed the workers, and a grep on "exit-on-failure" in the documentation 
> > will 
> > find "no-exit-on-failure" so that's not a problem.
> > 
> 
> Okay, updated patch sent (because I was not sure whether you / Willy
> would edit it yourselves or not).

Now applied, thank you!

Willy



1.8.1-fe66fd doesn't finish startup and doesn't listen to sockets

2017-12-07 Thread Pavlos Parissis
Hi,

OK, I haven't read the ML for ~2 weeks and a quick scan didn't reveal anything.
So, here I am asking something that may have been addressed already.

Today, I decided to switch my dev env to haproxy-1.8 using current master and I
started haproxy in the same way as I have been doing with older releases:

sudo ./haproxy -f /etc/haproxy/haproxy-ams4-dc.cfg
[WARNING] 340/173007 (3104) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] : a
'http-request' rule placed after a 'use_backend' rule will still be processed 
before.

above it didn't return and wasn't printing, expect the warning. I curled against
the IPs and got back connection error, see attached file for process output, 
lsof
info, build verion and haproxy.cfg.

I also started in the way it is mentioned in section 3 of management document:
sudo ./haproxy  -f /etc/haproxy/haproxy-ams4-dc.cfg -D -p /run/haproxy-ams4.pid
-sf $(cat /run/haproxy-ams4.pid)
cat: /run/haproxy-ams4.pid: No such file or directory
[WARNING] 340/173007 (3104) : parsing [/etc/haproxy/haproxy-ams4-dc.cfg:103] : a
'http-request' rule placed after a 'use_backend' rule will still be processed 
before.

but same result, haproxy didn't return and I had to CTRL-C it.

I am pretty sure I am doing something stupid but I can't find it.

Any ideas?

Cheers,
Pavlos

foo at me in ~
curl -v http://10.52.12.2   
  
* Rebuilt URL to: http://10.52.12.2/
*   Trying 10.52.12.2...
* TCP_NODELAY set
* connect to 10.52.12.2 port 80 failed: Connection refused
* Failed to connect to 10.52.12.2 port 80: Connection refused
* Closing connection 0
foo at me in ~ *7
ps|grep haproxy 
  
root  8244  0.0  0.0  51352  3800 pts/25   S+   17:14   0:00  |   |   \_ 
sudo ./haproxy -f /etc/haproxy/haproxy-ams4-dc.cfg
root  8245 99.5  0.0  32468  5704 pts/25   R+   17:14   0:50  |   |   
\_ ./haproxy -f /etc/haproxy/haproxy-ams4-dc.cfg
ppariss+ 10834  0.0  0.0  12788   968 pts/26   S+   17:15   0:00  |   \_ 
grep --colour=auto haproxy

foo at me in ~
sudo lsof -n|grep haproxy   
  
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
  Output information may be incomplete.
rsyslogd542  root4u unix 0x9b4879369c000t0  
17715 /var/lib/haproxy/dev/log type=DGRAM
in:imuxso   542   570root4u unix 0x9b4879369c000t0  
17715 /var/lib/haproxy/dev/log type=DGRAM
in:imklog   542   571root4u unix 0x9b4879369c000t0  
17715 /var/lib/haproxy/dev/log type=DGRAM
in:imudp542   572root4u unix 0x9b4879369c000t0  
17715 /var/lib/haproxy/dev/log type=DGRAM
rs:main 542   573root4u unix 0x9b4879369c000t0  
17715 /var/lib/haproxy/dev/log type=DGRAM
sudo   8244  root  cwd   DIR  254,4   4096  
  6307210 /home/foo/repo/haproxy-1.8
haproxy8245  root  cwd   DIR  254,4   4096  
  6307210 /home/foo/repo/haproxy-1.8
haproxy8245  root  rtd   DIR  254,1   4096  
2 /
haproxy8245  root  txt   REG  254,47643016  
  6307218 /home/foo/repo/haproxy-1.8/haproxy
haproxy8245  root  mem   REG  254,1  47632  
   915005 /lib/x86_64-linux-gnu/libnss_files-2.24.so
haproxy8245  root  mem   REG  254,1  47688  
   915010 /lib/x86_64-linux-gnu/libnss_nis-2.24.so
haproxy8245  root  mem   REG  254,1  89064  
   915001 /lib/x86_64-linux-gnu/libnsl-2.24.so
haproxy8245  root  mem   REG  254,1  31616  
   915003 /lib/x86_64-linux-gnu/libnss_compat-2.24.so
haproxy8245  root  mem   REG  254,11689360  
   914991 /lib/x86_64-linux-gnu/libc-2.24.so
haproxy8245  root  mem   REG  254,1 468920  
   914073 /lib/x86_64-linux-gnu/libpcre.so.3.13.3
haproxy8245  root  mem   REG  254,1  10128  
   392554 /usr/lib/x86_64-linux-gnu/libpcreposix.so.3.13.3
haproxy8245  root  mem   REG  254,12686672  
   396263 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
haproxy8245  root  mem   REG  254,1 442920  
   396761 /usr/lib/x86_64-linux-gnu/libssl.so.1.1
haproxy8245  root  mem   REG  254,1 135440  
   915013 /lib/x86_64-linux-gnu/libpthread-2.24.so
haproxy8245  root  mem   REG  254,1  14640  
   914996 

What can cause RST from HAProxy 1.7.9

2017-12-07 Thread Jason Price
Thank you Joao for starting this thread, and Moemen for your reply.

More information:
1) Something on the server side is sending RST packets to client initiated
connections.
2) When it happens, all connections from a single IP address are reaped.
However, it doesn't always happen to the same IP address.
3) Sketch architecture: The server is running HAProxy on port 443.  HAProxy
then sends the HTTP connections to various backends.  As Joao indicated,
this is a Kubernetes Ingress setup, but the above is the TL;DR.
4) In my analysis of the logs, I can't find evidence of HAProxy doing the
reaping, but it might be my lack of familiarity.
5) Here is our log-format. This facilitates JSON parsing of logs.:

log-format
{\"type\":\"haproxy-ingress\",\"protocol\":\"http\",\"timestamp\":\"%t\",\"client_ip\":%{+Q}ci,\"client_port\":%{+Q}cp,\"frontend_name\":%{+Q}ft,\"backend_name\":%{+Q}b,\"backend_ip\":%{+Q}s,\"time_handshake\":%{+Q}Th,\"time_idle_before_request\":%{+Q}Ti,\"time_receive_full_request\":%{+Q}TR,\"time_wait_send_backend\":%{+Q}Tw,\"time_backend_ack\":%{+Q}Tc,\"time_backend_processing\":%{+Q}Tr,\"time_delay_send_to_client\":%{+Q}Td,\"waittime_on_client\":%{+Q}Tq,\"active_request_time\":%{+Q}Ta,\"total_request_time\":%{+Q}Tt,\"http_status_code\":%{+Q}ST,\"client_bytes_uploaded\":%{+Q}U,\"bytes_sent\":%{+Q}B,\"termination_status\":\"%tsc\",\"active_conns\":%{+Q}ac,\"frontend_conns\":%{+Q}fc,\"backend_conns\":%{+Q}bc,\"server_conns\":%{+Q}sc,\"retry_conns\":%{+Q}rc,\"server_queue\":%{+Q}sq,\"backend_queue\":%{+Q}bq,\"request\":%{+Q,+E}r}

6) Around the 15:15 minute in the logs, there were 132 RSTs.  They were all
within one second.
7) The client IPs have been changed, and the domains have been changed.
Otherwise the logs are unchanged.  The IP changes are all 1:1.  They
include the 15:13-15:16 minutes.
8) The logs can be found here:
https://gist.github.com/zapman449/ab043d9849ed78826bdb0f343559a230

Thank you for any insight;
Jason


Re: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

2017-12-07 Thread Emeric Brun
Hi Pieter,

I'm CCing Christopher, he did some test on your patch.

R,
Emeric

On 12/06/2017 07:06 AM, Willy Tarreau wrote:
> Hi Pieter,
> 
> CCing Emeric since these parts have changed a bit for threads and
> there may be some subtle things we oversee.
> 
> thanks for this!
> Willy
> 
> On Wed, Dec 06, 2017 at 02:11:53AM +0100, PiBa-NL wrote:
>> Hi List, Simon and Baptiste,
>>
>> Sending to both of you guys as its both tcp-check and email related and you
>> are the maintainers of those parts.
>> Patch subject+content basically says it all (i hope.).
>>
>> It is intended to fixes yesterdays report:
>> https://www.mail-archive.com/haproxy@formilux.org/msg28158.html
>>
>> Please let me know if it is OK, or should be done differently.
>>
>> Thanks in advance,
>> PiBa-NL / Pieter
> 
>> From bf80b0398c08f94bebec30feaaddda422cb87ba1 Mon Sep 17 00:00:00 2001
>> From: PiBa-NL 
>> Date: Wed, 6 Dec 2017 01:35:43 +0100
>> Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from 
>> a
>>  email-alert task
>>
>> This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and 
>> avoids sending lots of emails when 'option log-health-checks' is used.
>> It is avoided to change the server state and possibly queue a new email 
>> while processing the email alert by checking if the check task is being 
>> processed for the process_email_alert struct.
>>
>> This needs to be backported to 1.8.
>> ---
>>  src/checks.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/src/checks.c b/src/checks.c
>> index eaf84a2..55bfde2 100644
>> --- a/src/checks.c
>> +++ b/src/checks.c
>> @@ -72,6 +72,7 @@ static int tcpcheck_main(struct check *);
>>  
>>  static struct pool_head *pool_head_email_alert   = NULL;
>>  static struct pool_head *pool_head_tcpcheck_rule = NULL;
>> +static struct task *process_email_alert(struct task *t);
>>  
>>  
>>  static const struct check_status check_statuses[HCHK_STATUS_SIZE] = {
>> @@ -198,6 +199,9 @@ const char *get_analyze_status(short analyze_status) {
>>   */
>>  static void set_server_check_status(struct check *check, short status, 
>> const char *desc)
>>  {
>> +if (check->task->process == process_email_alert)
>> +return; // email alerts should not change the status of the 
>> server
>> +
>>  struct server *s = check->server;
>>  short prev_status = check->status;
>>  int report = 0;
>> -- 
>> 2.10.1.windows.1
>>
> 




Re: Segfault with 1.8.0 build (RHEL5, old gcc).

2017-12-07 Thread Olivier Houchard
Hi Christopher,

On Wed, Dec 06, 2017 at 05:34:15PM -0800, Christopher Lane wrote:
> On Mon, Dec 4, 2017 at 11:56 AM, Christopher Lane
>  wrote:
> >
> >
> >
> > On Mon, Dec 4, 2017 at 4:22 AM Lukas Tribus  wrote:
> >
> >>Hello Christopher,
> >
> >
> >>2017-12-01 20:59 GMT+01:00 Christopher Lane :
> >>>
> >>> gist with backtrace, -vv output, and config file. Also strace.
> >>>
> >>> https://gist.github.com/jayalane/c6dbe7918aa9635b62c874d20f57dfec
> >>>
> >>> It does all the listens and then right after the first epoll is done it
> >>> has this segv. all the local variables are "optimized out"
> >>>
> >>> (We really want the hard-stop-after -- we get a leak of children with our
> >>> super frequent soft-reloads).
> >
> >>FYI, hard-stop-after has been backported to 1.7 stable and is in all
> >>rebuilds starting with 1.7.4:
> >
> >> https://www.mail-archive.com/haproxy@formilux.org/msg25494.html
> >
> >
> >
> >> Regards,
> >> Lukas
> >
> > Olivier:
> >
> > The patch worked beautifully to keep the 1.8.0 from crashing.
> >
> > Lukas:
> >
> > Thanks for the tip, we'll consider 1.7.9 then (settled on 1.8.0 by starting
> > out with reading the release notes for it; we are upgrading from 1.5.0).
> >
> > --Chris
> 
> Unrelated to the prior contents of this thread, I found the root cause
> for our issue (child leak).
> 
> The haproxy was being started from a Java process using Runtime.exec()
> and the PIDs were jammed into 1 cell of argv.  It killed the first
> child but not the later ones, as atol("13233 13234 13235 13236")
> returns 13233.  We have fixed the Java code.
> 
> http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/haproxy.c;h=eb5e65b40e7b8b2a4f8fb04b3552401e42fb0a89;hb=HEAD#l1421
> 
> I note that the -sf parsing code uses atol and has no error checking.
> Would the project be interested in a patch to use strtol with error
> checks?  Could log a warning if unconsumed bytes in the arg (safer),
> or fail to start (unsafe).  I'm sure I'm not the only one with this
> sort of bug, given how tricky shell escaping and so on is.

Yes, replacing ato* with something more reasonnable would certainly be
appreciated :)

Regards,

Olivier