Re: Configuring HAProxy session limits

2018-07-24 Thread Moemen MHEDHBI
Hi Àbéjídé,


On 24/07/2018 17:59, Àbéjídé Àyodélé wrote:
> Hi Friends,
>
> I am trying to bump session limits via the maxconn in the global
> section as
> below:
>
> cat /etc/haproxy/redacted-haproxy.cfg
> global
>   maxconn 1
>   stats socket /var/run/redacted-haproxy-stats.sock user haproxy group
> haproxy
> mode 660 level operator expose-fd listeners
>
> frontend redacted-frontend
>   mode tcp
>   bind :2004
>   default_backend redacted-backend
>
> backend redacted-backend
>   mode tcp
>   balance leastconn
>   hash-type consistent
>
>   server redacted_0 redacted01.qa:8443 
> check agent-check agent-port 8080 weight 100
> send-proxy
>   server redacted-684994ccd-6rn9q 192.168.39.223:8443
>  check port 8443 weight 100
> send-proxy
>   server redacted-684994ccd-c88d9 192.168.46.66:8443
>  check port 8443 weight 100
> send-proxy
>   server redacted-canary-58ccdb7cf4-47f4m 192.168.53.47:8443
>  check port 8443
> weight 100 send-proxy
>
> NOTE: I removed some portion of the config for conciseness sake.
>
> However this did not seem to have any impact on HAProxy after a reload
> as seen
> below:
>
> echo "show stat" | socat
> unix-connect:/var/run/redacted-haproxy-stats.sock stdio
> | cut -d"," -f7
> slim
> 2000
>
>
>
>
> 200

When slim is used in a Frontend line (in your case: redacted-frontend)
it refers to the maxconn of the frontend.
By default, when maxconn is not specified it is equal to 2000:
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-maxconn

When slim is used in a Backend line (in your case: redacted-backend) it
refers to the fullconn param because backends does not have maxconns.
The fullconn param is a little bit more complicated to understand than
maxconn. You can find more information about it in the doc:
https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#4.2-fullconn
or if you search the mailing list history but most of the time you don't
need to use it.
To understand the 200 value, you need to consider the following
statement from the doc :
> Since it's hard to get this value right, haproxy automatically sets it
to 10% of the sum of the maxconns of all frontends that may branch to
this backend
So 10% of 2000 = 200

++
- Moemen.
>
> I do not know where 2000 and 200 are coming from as I did not at any point
> configure that, the maxconn was previously 4096.
>
> A more detailed stats output is below:
>
> echo "show stat" | socat
> unix-connect:/var/run/redacted-haproxy-stats.sock stdio
> #
> pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,
> redacted-frontend,FRONTEND,,,0,2,2000,3694,0,0,0,0,0,OPEN,1,2,00,3,0,9,,,0,0,0,,,0,0,0,0,tcp,,3,9,3694,,0,0,
> redacted-backend,redacted_0,0,0,0,1,,2,0,0,,0,,0,0,0,0,UP,94,1,0,0,0,1582,0,,1,3,1,,2,,2,0,,1,L4OK,,0,,,0,0,683,,via
> agent : up,0,0,0,0,L7OK,0,50,Layer4 check passed,Layer7 check
> passed,2,3,4,1,1,1,10.185.57.54:8443
> ,,tcp
> redacted-backend,redacted-684994ccd-6rn9q,0,0,0,1,,46,0,0,,0,,0,0,0,0,UP,100,1,0,0,0,1582,0,,1,3,2,,46,,2,0,,1,L4OK,,0,,,0,0,6,,,0,0,0,1Layer4
> check passed,,2,3,4192.168.39.223:8443
> ,,tcp
> redacted-backend,redacted-684994ccd-c88d9,0,0,0,1,,45,0,0,,0,,0,0,0,0,UP,100,1,0,0,0,1582,0,,1,3,3,,45,,2,0,,1,L4OK,,0,,,0,0,12,,,0,0,0,0Layer4
> check passed,,2,3,4192.168.46.66:8443
> ,,tcp
> redacted-backend,redacted-canary-58ccdb7cf4-47f4m,0,0,0,1,,45,0,0,,0,,0,0,0,0,UP,100,1,0,0,0,1582,0,,1,3,4,,45,,2,0,,1,L4OK,,0,,,0,0,10,,,0,0,0,1Layer4
> check passed,,2,3,4192.168.53.47:8443
> ,,tcp
> redacted-backend,BACKEND,0,0,0,2,200,3694,0,0,0,0,,0,0,0,0,UP,394,4,0,,0,1582,0,,1,3,0,,138,,1,3,,9,,0,0,0,0,0,0,6,,,0,0,0,1,,tcp,leastconn,,,
>
> I need guidance on what I need to do to configure session limits
> correctly and
> also make it reflect in the exported metrics.
>
> Thanks!
>
> Abejide Ayodele
> It always seems impossible until it's done. --Nelson Mandela



Re: Issue with TCP splicing

2018-07-24 Thread Julien Semaan
> Sorry, that was a "can" that really meant "can't" :) I can't 
reproduce it.

    Aw well, I was surprised it was so easy :)

> Can you try to upgrade to 1.8.12 ? A number of bugs have been fixed since
    I did try the upgrade to 1.8.12, got the same results (segfault) 
although I wasn't able to confirm it did segfault in the TCP splicing.


> What kind hove load do you have when it segfaults ?
    Far from enormous, maximum 10 requests per second, but as I said in 
my first post, the amount of TCP retransmissions and resets is very 
large due to the fact we're black-holing the traffic since we use 
haproxy for our captive portal
    I'd be happy to provide a pcap but for privacy reasons I can't 
extract it from a production environment and I can't see to replicate it 
in lab.


Thanks!

--
Julien Semaan
jsem...@inverse.ca   ::  +1 (866) 353-6153 *155  ::www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)



On 2018-07-24 01:20 PM, Olivier Houchard wrote:

Hi Julian,

On Tue, Jul 24, 2018 at 12:58:27PM -0400, Julien Semaan wrote:

Hi Olivier,

Glad you're able to replicate it because I can't get it to happen
consistently!
I'd be happy if you could share the details of how it could be replicated if
that's not too complex or hard to explain via email.

Sorry, that was a "can" that really meant "can't" :) I can't reproduce it.


Anyway, attached to this email, you'll find the haproxy configuration that
has gotten the issue.
Also, a lua script we do use and that is referenced in the configuration.

This is running on CentOS 7.5.1804 with kernel 3.10.0-862.6.3.el7.x86_64

It is running through systemd, unit file attached to this email as well.

Let me know if you need more infos, and I'll be glad to provide them or if
you have a pcap I can replay to generate this issue in our lab.

Best Regards,


Thanks a lot for the informations !
I'm now pretty sure it's not related to directly splicing, more likely some
memory corruption.
Can you try to upgrade to 1.8.12 ? A number of bugs have been fixed since
1.8.9.
What kind hove load do you have when it segfaults ?

Thanks !

Olivier




Re: Issue with TCP splicing

2018-07-24 Thread Olivier Houchard
Hi Julian,

On Tue, Jul 24, 2018 at 12:58:27PM -0400, Julien Semaan wrote:
> Hi Olivier,
> 
> Glad you're able to replicate it because I can't get it to happen
> consistently!
> I'd be happy if you could share the details of how it could be replicated if
> that's not too complex or hard to explain via email.

Sorry, that was a "can" that really meant "can't" :) I can't reproduce it.

> 
> Anyway, attached to this email, you'll find the haproxy configuration that
> has gotten the issue.
> Also, a lua script we do use and that is referenced in the configuration.
> 
> This is running on CentOS 7.5.1804 with kernel 3.10.0-862.6.3.el7.x86_64
> 
> It is running through systemd, unit file attached to this email as well.
> 
> Let me know if you need more infos, and I'll be glad to provide them or if
> you have a pcap I can replay to generate this issue in our lab.
> 
> Best Regards,
> 

Thanks a lot for the informations !
I'm now pretty sure it's not related to directly splicing, more likely some 
memory corruption.
Can you try to upgrade to 1.8.12 ? A number of bugs have been fixed since
1.8.9.
What kind hove load do you have when it segfaults ?

Thanks !

Olivier



Re: [PATCH] MINOR: ssl: BoringSSL matches OpenSSL 1.1.0

2018-07-24 Thread Willy Tarreau
Hi Manu,

On Mon, Jul 23, 2018 at 06:12:34PM +0200, Emmanuel Hocdet wrote:
> Hi Willy,
> 
> This patch is necessary to build with current BoringSSL (SSL_SESSION is now 
> opaque).
> BoringSSL correctly matches OpenSSL 1.1.0 since 3b2ff028 for haproxy needs.
> The patch revert part of haproxy 019f9b10 (openssl-compat.h).
> This will not break openssl/libressl compat.

OK, but the chunk here seems to contradict this assertion :


@@ -119,13 +114,6 @@ static inline const OCSP_CERTID 
*OCSP_SINGLERESP_get0_id(const OCSP_SINGLERESP *
 }
 #endif
 
-#endif
-
-#if (OPENSSL_VERSION_NUMBER < 0x101fL) || defined(LIBRESSL_VERSION_NUMBER)
-/*
- * Functions introduced in OpenSSL 1.1.0 and not yet present in LibreSSL
- */
-
 static inline pem_password_cb *SSL_CTX_get_default_passwd_cb(SSL_CTX *ctx)
 {
return ctx->default_passwd_callback;

I'm seeing that libressl will use a different code that is common
with openssl while you seem to have targetted boringssl only. Maybe 
this part escaped from a larger patch that you used during development ?

Thanks,
Willy



Re: Issue with TCP splicing

2018-07-24 Thread Julien Semaan

Hi Olivier,

Glad you're able to replicate it because I can't get it to happen 
consistently!
I'd be happy if you could share the details of how it could be 
replicated if that's not too complex or hard to explain via email.


Anyway, attached to this email, you'll find the haproxy configuration 
that has gotten the issue.

Also, a lua script we do use and that is referenced in the configuration.

This is running on CentOS 7.5.1804 with kernel 3.10.0-862.6.3.el7.x86_64

It is running through systemd, unit file attached to this email as well.

Let me know if you need more infos, and I'll be glad to provide them or 
if you have a pcap I can replay to generate this issue in our lab.


Best Regards,

--
Julien Semaan
jsem...@inverse.ca   ::  +1 (866) 353-6153 *155  ::www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)



On 2018-07-24 12:28 PM, Olivier Houchard wrote:

Hi Julian,

On Mon, Jul 23, 2018 at 09:07:32AM -0400, Julien Semaan wrote:

Hi all,

We're currently using haproxy in our project PacketFence
(https://packetfence.org) and are currently experiencing an issue with
haproxy segfaulting when TCP splicing is enabled.

We're currently running version 1.8.9 and are occasionally getting segfaults
on this specific line in stream.c (line 2131):
(objt_cs(si_b->end) && __objt_cs(si_b->end)->conn->xprt &&
__objt_cs(si_b->end)->conn->xprt->snd_pipe) &&

I wasn't too bright when I found it through gdb and forgot to copy the
backtrace, so I'm hoping that the issue can be found with this limited
information.

After commenting out the code for TCP splicing with the patch attached to
the email, then the issue stopped happening.

Best Regards,


I can seem to reproduce this.
Care to share your configuration and your setup ?

Thanks !

Olivier


-- Update host on the fly

core.register_action("change_host", { 'http-req'}, function(txn)
   if txn.sf:req_fhdr("Host") == nil then
   txn.set_var(txn,"req.host","wireless.zammitcorp.ca") -- Update host on the fly
   end
end)

-- Select backend based on Host header

local string = require("string");

core.register_action("select", { "http-req" }, function(txn)

if ( string.match(txn.sf:path(), '/profile.xml')) then
-- nothing, we let it go through for the XML profiles
elseif ( not ( string.match(txn.sf:req_fhdr("Host"):lower(), '^192%.168%.2%.10[0-9:]*$') or string.match(txn.sf:req_fhdr("Host"):lower(), '^192%.168%.4%.10[0-9:]*$') or string.match(txn.sf:req_fhdr("Host"):lower(), '^10%.61%.126%.10[0-9:]*$') or string.match(txn.sf:req_fhdr("Host"):lower(), '^wireless%.zammitcorp%.ca[0-9:]*$') or false ) ) then
txn:set_var("req.action","proxy")
else
if (txn.sf:path() == '/') then
txn:set_var("req.action","proxy")
elseif ( string.match(txn.sf:path(), '^/common') or string.match(txn.sf:path(), '^/content') or string.match(txn.sf:path(), '^/favicon.ico$') ) then
txn:set_var("req.action","static")
end
end
end)

global
  external-check
  user haproxy
group haproxy
daemon
pidfile /usr/local/pf/var/run/haproxy-portal.pid
log /dev/log local0
stats socket /tmp/proxystats level admin process 1
maxconn 4000
#Followup of https://github.com/inverse-inc/packetfence/pull/893
#haproxy 1.6.11 | intermediate profile | OpenSSL 1.0.1e | SRC: 
https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy-1.6.11=1.0.1e=yes=intermediate
#Oldest compatible clients: Firefox 1, Chrome 1, IE 7, Opera 5, Safari 
1, Windows XP IE8, Android 2.3, Java 7
tune.ssl.default-dh-param 2048
ssl-default-bind-ciphers 
ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS
ssl-default-bind-options no-sslv3 no-tls-tickets
ssl-default-server-ciphers 

Re: Issue with TCP splicing

2018-07-24 Thread Olivier Houchard
Hi Julian,

On Mon, Jul 23, 2018 at 09:07:32AM -0400, Julien Semaan wrote:
> Hi all,
> 
> We're currently using haproxy in our project PacketFence
> (https://packetfence.org) and are currently experiencing an issue with
> haproxy segfaulting when TCP splicing is enabled.
> 
> We're currently running version 1.8.9 and are occasionally getting segfaults
> on this specific line in stream.c (line 2131):
> (objt_cs(si_b->end) && __objt_cs(si_b->end)->conn->xprt &&
> __objt_cs(si_b->end)->conn->xprt->snd_pipe) &&
> 
> I wasn't too bright when I found it through gdb and forgot to copy the
> backtrace, so I'm hoping that the issue can be found with this limited
> information.
> 
> After commenting out the code for TCP splicing with the patch attached to
> the email, then the issue stopped happening.
> 
> Best Regards,
> 

I can seem to reproduce this.
Care to share your configuration and your setup ?

Thanks !

Olivier



Configuring HAProxy session limits

2018-07-24 Thread Àbéjídé Àyodélé
Hi Friends,

I am trying to bump session limits via the maxconn in the global section as
below:

cat /etc/haproxy/redacted-haproxy.cfg
global
  maxconn 1
  stats socket /var/run/redacted-haproxy-stats.sock user haproxy group
haproxy
mode 660 level operator expose-fd listeners

frontend redacted-frontend
  mode tcp
  bind :2004
  default_backend redacted-backend

backend redacted-backend
  mode tcp
  balance leastconn
  hash-type consistent

  server redacted_0 redacted01.qa:8443 check agent-check agent-port 8080
weight 100
send-proxy
  server redacted-684994ccd-6rn9q 192.168.39.223:8443 check port 8443
weight 100
send-proxy
  server redacted-684994ccd-c88d9 192.168.46.66:8443 check port 8443 weight
100
send-proxy
  server redacted-canary-58ccdb7cf4-47f4m 192.168.53.47:8443 check port 8443
weight 100 send-proxy

NOTE: I removed some portion of the config for conciseness sake.

However this did not seem to have any impact on HAProxy after a reload as
seen
below:

echo "show stat" | socat unix-connect:/var/run/redacted-haproxy-stats.sock
stdio
| cut -d"," -f7
slim
2000




200

I do not know where 2000 and 200 are coming from as I did not at any point
configure that, the maxconn was previously 4096.

A more detailed stats output is below:

echo "show stat" | socat unix-connect:/var/run/redacted-haproxy-stats.sock
stdio
#
pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,
redacted-frontend,FRONTEND,,,0,2,2000,3694,0,0,0,0,0,OPEN,1,2,00,3,0,9,,,0,0,0,,,0,0,0,0,tcp,,3,9,3694,,0,0,
redacted-backend,redacted_0,0,0,0,1,,2,0,0,,0,,0,0,0,0,UP,94,1,0,0,0,1582,0,,1,3,1,,2,,2,0,,1,L4OK,,0,,,0,0,683,,via
agent : up,0,0,0,0,L7OK,0,50,Layer4 check passed,Layer7 check
passed,2,3,4,1,1,1,10.185.57.54:8443,,tcp
redacted-backend,redacted-684994ccd-6rn9q,0,0,0,1,,46,0,0,,0,,0,0,0,0,UP,100,1,0,0,0,1582,0,,1,3,2,,46,,2,0,,1,L4OK,,0,,,0,0,6,,,0,0,0,1Layer4
check passed,,2,3,4192.168.39.223:8443,,tcp
redacted-backend,redacted-684994ccd-c88d9,0,0,0,1,,45,0,0,,0,,0,0,0,0,UP,100,1,0,0,0,1582,0,,1,3,3,,45,,2,0,,1,L4OK,,0,,,0,0,12,,,0,0,0,0Layer4
check passed,,2,3,4192.168.46.66:8443,,tcp
redacted-backend,redacted-canary-58ccdb7cf4-47f4m,0,0,0,1,,45,0,0,,0,,0,0,0,0,UP,100,1,0,0,0,1582,0,,1,3,4,,45,,2,0,,1,L4OK,,0,,,0,0,10,,,0,0,0,1Layer4
check passed,,2,3,4192.168.53.47:8443,,tcp
redacted-backend,BACKEND,0,0,0,2,200,3694,0,0,0,0,,0,0,0,0,UP,394,4,0,,0,1582,0,,1,3,0,,138,,1,3,,9,,0,0,0,0,0,0,6,,,0,0,0,1,,tcp,leastconn,,,

I need guidance on what I need to do to configure session limits correctly
and
also make it reflect in the exported metrics.

Thanks!

Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela


Re: [PATCH] MINOR: server: Don't make "server" in frontend fatal.

2018-07-24 Thread Willy Tarreau
On Tue, Jul 24, 2018 at 04:59:39PM +0200, Olivier Houchard wrote:
> Right now, when we have "server", "default-server", or "server-template"
> in a frontend, we warn about it being ignored, only to be considered fatal
> later.
> That sounds a bit silly, so the attached patch makes it non-fatal.

It doesn't only sound silly, it is. My fingers copy-pasted this exactly
9 years ago over hundreds of locations without asking for my permission!

Fix applied, thanks!
Willy



[PATCH] MINOR: server: Don't make "server" in frontend fatal.

2018-07-24 Thread Olivier Houchard
Hi,

Right now, when we have "server", "default-server", or "server-template"
in a frontend, we warn about it being ignored, only to be considered fatal
later.
That sounds a bit silly, so the attached patch makes it non-fatal.

Regards,

Olivier
>From 9d2ab5b57dd4d14bce82923cb9b35bb74ac642bb Mon Sep 17 00:00:00 2001
From: Olivier Houchard 
Date: Tue, 24 Jul 2018 16:48:59 +0200
Subject: [PATCH] BUG/MINOR: servers: Don't make "server" in a frontend fatal.

When parsing the configuration, if "server", "default-server" or
"server-template" are found in a frontend, we first warn that it will be
ignored, only to be considered a fatal error later. Be true to our word, and
just ignore it.

This should be backported to 1.8 and 1.7.
---
 src/server.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/server.c b/src/server.c
index d96edc77a..4498fd878 100644
--- a/src/server.c
+++ b/src/server.c
@@ -1937,7 +1937,7 @@ int parse_server(const char *file, int linenum, char 
**args, struct proxy *curpr
goto out;
}
else if (warnifnotcap(curproxy, PR_CAP_BE, file, linenum, 
args[0], NULL))
-   err_code |= ERR_ALERT | ERR_FATAL;
+   err_code |= ERR_WARN;
 
/* There is no mandatory first arguments for default server. */
if (srv) {
-- 
2.14.3



Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-24 Thread Willy Tarreau
Hi Milan,

On Tue, Jul 24, 2018 at 12:23:37PM +0200, Milan Petruzelka wrote:
> Hi Willy,
> 
> Do you *think* that you got less CLOSE_WAITs or that the latest fixes
> > didn't change anything ? I suspect that for some reason you might be
> > hit by several bugs, which is what has complicated the diagnostic, but
> > that's just pure guess.
> >
> >
> I'm not sure. I left patched haproxy running over last weekend. I have a
> slight feeling that there was less hanging connections than over last week,
> but it could be because of lower weekend traffic.

OK, it's not that obvious at least.

> Now i'm running latest
> GIT version (1.8.12-5e100b-15) and i'll compare the speed of blocked
> connections increase against vannila 1.8.12 from last week.
> 
> I'll send more extended "show fd" dumps as soon as i catch some more.
> Please let me know If you add more h2 state info into 1.8 git version and
> i'll run it in production to get more info.

So I'm having one update to emit the missing info on "show fd" (patch merged
and pushed already, that I'm attaching here if it's easier for you), and
two other ones which fix a situation which can definitely cause this to
happen, but which I still fail to reproduce using the various tools I
have (curl, h2c, nghttp, ...). The case is the following and derives from
your previous capture :

  - multiple streams fight for the mux and are queued in the send_list
  - at this point the mux has to emit a GOAWAY for a reason that's still
to be figured (probably it received a bad message but we could guess
anything)
  - the streams are woken up, notified about the error
  - h2_detach() is called for each of them
  - they are detached from the h2 stream (struct h2s technically
speaking, which is the internal representation of the h2 stream
state)
  - since the streams are marked as blocked for some room, they are orphaned
and nothing more is done on them.
  - at this point, any activity on the connection goes through h2_wake()
which sees the conneciton in ERROR2 state, tries again to release the
streams, cannot, and stops polling.

=> from this point, no more events can be received on the connection, and
   the streams remain orphaned forever. This case was partially addressed
   by the patch I sent a while ago but I wasn't completely sure about the
   sequence that could lead to this. I'm attaching this patch (0001-WIP-*).

However this patch alone (0001-WIP-*) cannot prevent the streams from being
orphaned (6th step above) so while it's needed, it's not enough. The third
one is required for this (h2-error.diff). 

And I *think* (and hope) that with these 2 patches on top of latest 1.8
we're OK now. What I would appreciate quite a lot if you're willing to
let me abuse your time is to either git pull or apply
0001-MINOR-h2-add-the-error-code-and-the-max-last-stream-.patch on top
of your up-to-date branch, then apply
0001-WIP-h2-try-to-address-possible-causes-for-the-close_.patch then
apply h2-error.diff and test again. The last two will apply with an
offset but that's not a problem.

If you still see some close_waits (I really hope not), it means they're
caused by something totally different and then "show fd" will help better
than previously thanks to the extra info in the first patch.

I think we're about to nail it down, to be honest ;-)

Thanks!
Willy
>From 12a4b5c69d3eb0cc01bf92df215c7cc8d70ee8bc Mon Sep 17 00:00:00 2001
From: Willy Tarreau 
Date: Tue, 24 Jul 2018 14:12:42 +0200
Subject: MINOR: h2: add the error code and the max/last stream IDs to "show
 fd"

This is intented to help debugging H2 in field.

(cherry picked from commit 616ac81dec5759990ee600047d8ad900f6eba6e8)
[wt: adapted context a little bit]
Signed-off-by: Willy Tarreau 
---
 src/mux_h2.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/mux_h2.c b/src/mux_h2.c
index 7f14de4..95effb9 100644
--- a/src/mux_h2.c
+++ b/src/mux_h2.c
@@ -3515,8 +3515,11 @@ static void h2_show_fd(struct chunk *msg, struct 
connection *conn)
node = eb32_next(node);
}
 
-   chunk_appendf(msg, " st0=%d flg=0x%08x nbst=%u nbcs=%u fctl_cnt=%d 
send_cnt=%d tree_cnt=%d orph_cnt=%d dbuf=%u/%u mbuf=%u/%u",
- h2c->st0, h2c->flags, h2c->nb_streams, h2c->nb_cs, 
fctl_cnt, send_cnt, tree_cnt, orph_cnt, h2c->dbuf->i, h2c->dbuf->size, 
h2c->mbuf->o, h2c->mbuf->size);
+   chunk_appendf(msg, " st0=%d err=%d maxid=%d lastid=%d flg=0x%08x 
nbst=%u nbcs=%u"
+ " fctl_cnt=%d send_cnt=%d tree_cnt=%d orph_cnt=%d 
dbuf=%u/%u mbuf=%u/%u",
+ h2c->st0, h2c->errcode, h2c->max_id, h2c->last_sid, 
h2c->flags,
+ h2c->nb_streams, h2c->nb_cs, fctl_cnt, send_cnt, 
tree_cnt, orph_cnt,
+ h2c->dbuf->i, h2c->dbuf->size, h2c->mbuf->o, 
h2c->mbuf->size);
 }
 
 /***/
-- 
1.7.12.1

>From f055296f7598ba84b08e78f8309ffd7fa0c9522b Mon Sep 17 

Re: Connections stuck in CLOSE_WAIT state with h2

2018-07-24 Thread Milan Petruželka
Hi Willy,

Do you *think* that you got less CLOSE_WAITs or that the latest fixes
> didn't change anything ? I suspect that for some reason you might be
> hit by several bugs, which is what has complicated the diagnostic, but
> that's just pure guess.
>
>
I'm not sure. I left patched haproxy running over last weekend. I have a
slight feeling that there was less hanging connections than over last week,
but it could be because of lower weekend traffic. Now i'm running latest
GIT version (1.8.12-5e100b-15) and i'll compare the speed of blocked
connections increase against vannila 1.8.12 from last week.

Oh I'm just seeing you already did that in the next e-mail. Thank you :-)
>

I'll send more extended "show fd" dumps as soon as i catch some more.
Please let me know If you add more h2 state info into 1.8 git version and
i'll run it in production to get more info.

So we have this :
>
>  25 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
> owner=0x24f0a70
> iocb=0x4d34c0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
>
> fe=fe-http mux=H2 mux_ctx=0x258a880 st0=7 flg=0x1000 nbst=8 nbcs=0
>
> fctl_cnt=0 send_cnt=8 tree_cnt=8 orph_cnt=8 dbuf=0/0 mbuf=0/16384
>
>
>   - st0=7 => H2_CS_ERROR2 : an error was sent, either it succeeded or
> could not be sent and had to be aborted nonetheless ;
>
>   - flg=1000 => H2_CF_GOAWAY_SENT : the GOAWAY frame was sent to the mux
> buffer.
>
>   - nbst=8 => 8 streams still attached
>
>   - nbcs=0 => 0 conn_streams found (application layer detached or not
> attached yet)
>
>   - send_cnt=8 => 8 streams still in the send_list, waiting for the mux
> to pick their contentx.
>
>   - tree_cnt=8 => 8 streams known in the tree (hence they are still valid
> from the H2 protocol perspective)
>
>   - orph_cnt=8 => 8 streams are orphaned : these streams have quit at the
> application layer (very likely a timeout).
>
>   - mbuf=0/16384 : the mux buffer is empty but allocated. It's not very
> common.
>
> At this point what it indicates is that :
>   - 8 streams were active on this connection and a response was sent (at
> least partially) and probably waited for the mux buffer to be empty
> due to data from other previous streams. I'm realising it would be
> nice to also report the highest stream index to get an idea of the
> number of past streams on the connection.
>
>   - an error happened (protocol error, network issue, etc, no more info
> at the moment) and caused haproxy to emit a GOAWAY frame. While doing
> so, the pending streams in the send_list were not destroyed.
>
>   - then for an unknown reason the situation doesn't move anymore. I'm
> realising that one case I figured in the past with an error possibly
> blocking the connection at least partially covers one point here, it
> causes the mux buffer to remain allocated, so this patch would have
> caused it to be released, but it's still incomplete.
>
> Now I have some elements to dig through, I'll try to mentally reproduce
> the complex sequence of a blocked response with a GOAWAY being sent at
> the same time to see what happens.
>
>
Thanks a lot for a detailed description.
Milan


Re: Suppression de l’extension

2018-07-24 Thread --
Hello

It works with :

>  rewrite ^(/.*)\.html  https://$host/$1 permanent;


Thank you :) 


Envoyé de mon iPhone

> Le 24 juil. 2018 à 08:49, Aleksandar Lazic  a écrit :
> 
> Hi.
> 
>> On 20/07/2018 11:16, -- wrote:
>> Hello,
>> 
>> In fact I just want the display of the .html extension on my site no longer 
>> displayed
>> 
>> I use haproxy with Nginx, I can make url rewrite with Nginx that works well:
>> 
>> server {
>> rewrite ^(/.*)\.html(\?.*)?$ $1$2 permanent;
> 
> How about to use this in the rewrite?
> 
>  rewrite ^(/.*)\.html(\?.*)?$ https://$host/$1$2 permanent;
>  
>> index index.html;
>> try_files $uri.html $uri/ $uri =404;
>> }
> 
> Best regards
> Aleks
> 
>> But since Nginx runs on port 8889 (on the same machine as Haproxy) it
>> redirects me to this port and I lose the connection (haproxy is
>> listening on port 443)
>> 
>> I wish to do this with Haproxy, is it possible?
>> 
>> example:
>> 
>> https://site.com/index.html -> https://site.com/index (the resource without 
>> the .html does not exist but i want it to be displayed like this in the 
>> browser)
>> 
>> 
>> Thank you
>> 
>> Envoyé de mon iPhone
>> 
>>> Le 19 juil. 2018 à 23:48, Aleksandar Lazic  a écrit :
>>> 
>>> Hi.
>>> 
 On 19/07/2018 15:09, -- wrote:
 Hello,
 
 Je souhaite supprimer l’extension présentée par mon serveur nginx mais
 depuis Haproxy
 
 Type
 
 A.com/index.html en A.com/index
 
 Est ce possible ?
>>> 
>>> Maybe, but please can you ask in English, thanks.
>>> 
>>> But let me try to interpret you question.
>>> 
>>> reqrep ^([^\ :]*)\ /index.html \1\ /index
>>> 
>>> https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4.2-reqrep
>>> 
 Merci
 
 Envoyé de mon iPhone
>>> 
>>> Best regards
>>> Aleksg


Re: Suppression de l’extension

2018-07-24 Thread Aleksandar Lazic

Hi.

On 20/07/2018 11:16, -- wrote:

Hello,

In fact I just want the display of the .html extension on my site no longer 
displayed

I use haproxy with Nginx, I can make url rewrite with Nginx that works well:

server {
 rewrite ^(/.*)\.html(\?.*)?$ $1$2 permanent;


How about to use this in the rewrite?

  rewrite ^(/.*)\.html(\?.*)?$ https://$host/$1$2 permanent;
  

 index index.html;
 try_files $uri.html $uri/ $uri =404;
}


Best regards
Aleks


But since Nginx runs on port 8889 (on the same machine as Haproxy) it
redirects me to this port and I lose the connection (haproxy is
listening on port 443)

I wish to do this with Haproxy, is it possible?

example:

https://site.com/index.html -> https://site.com/index (the resource without the 
.html does not exist but i want it to be displayed like this in the browser)


Thank you

Envoyé de mon iPhone


Le 19 juil. 2018 à 23:48, Aleksandar Lazic  a écrit :

Hi.


On 19/07/2018 15:09, -- wrote:
Hello,

Je souhaite supprimer l’extension présentée par mon serveur nginx mais
depuis Haproxy

Type

A.com/index.html en A.com/index

Est ce possible ?


Maybe, but please can you ask in English, thanks.

But let me try to interpret you question.

reqrep ^([^\ :]*)\ /index.html \1\ /index

https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4.2-reqrep


Merci

Envoyé de mon iPhone


Best regards
Aleksg




Re: Regexp

2018-07-24 Thread Frederic Lecaille

On 07/20/2018 12:03 AM, Aleksandar Lazic wrote:

Hi.

On 18/07/2018 13:10, Haim Ari wrote:

Hello,

Trying to set backend by regexp

This regexp works outside of haproxy

String:

/1.0/manage/bu/ca?token=68bf68bf68bf68bf68bf=1212121212=123456789 



Regexp:

^\/1\.0\/manage\/bu\/ca\?token=.*.segId=.*=123456789

What is the right syntax for this in haproxy ?


I would use

https://regex101.com/r/TjH7Ul/1/

^\/1\.0\/manage\/bu\/ca\?token=(.*).segId=(.*).partner=123456789


AFAIK, even if this is correct, you do not have to escape the '/' 
characters to match them. You had to do that in your GUI because you 
selected a regex form with '/' as delimiter character (/.../gm).


haproxy uses POSIX regexes with ^.[$()|*+?{\ as list of characters which 
must be escaped if you want them to be interpreted as literal characters 
(see regex(7)).


There is an explanation in your GUI which indicates exactly that:

"\/ matches the character / literally (case sensitive)"

So, the regex above may be shortened as follows:

^/1\.0/manage/bu/ca\?token=(.*).segId=(.*).partner=123456789

which is a bit more readable.

Fred.



Re: Issue with TCP splicing

2018-07-24 Thread Aleksandar Lazic

Hi Julien.

On 23/07/2018 13:59, Julien Semaan wrote:

Doing it with the patch does the equivalent of disabling it with the
option (realized there was an option afterwards).

We're more looking to know if the haproxy team is interested in getting
the issue addressed more than just getting the workaround



From my experience with haproxy team I would say yes.


Such cases are not really easy to find because the developer should be
able to reproduce the behaviour, which can take some time and if in case
it's special to your environment then there would be some amount of data
and time from you and your team.

As you can see on the mailing list archive there are a lot of long
running and discussion threads to solve some specific and some common
bugs ;-)

https://www.mail-archive.com/haproxy@formilux.org/

As I'm just a member of the community and not a common developer I can
only invite you to help us to solve this issue, I also want to tell you
that I don't know how difficult or long running debug session this will
be so let me please you to have some patience if it takes some time to
debug and solve the issue.

For starting it helps to know the full version, config and some
" bt full " from core dumps from debug compiled version.

One question from the enterprise vendor supported experience is.
Can you produce the behaviour with the latest version ;-)



Thanks!

--
Julien Semaan


Best regards
Aleks


jsem...@inverse.ca   ::  +1 (866) 353-6153 *155  ::www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)

On 2018-07-23 11:25 AM, Aleksandar Lazic wrote:

Hi Julien.

On 23/07/2018 09:07, Julien Semaan wrote:

Hi all,

We're currently using haproxy in our project PacketFence 
(https://packetfence.org) and are currently experiencing an issue 
with haproxy segfaulting when TCP splicing is enabled.


We're currently running version 1.8.9 and are occasionally getting 
segfaults on this specific line in stream.c (line 2131):
(objt_cs(si_b->end) && __objt_cs(si_b->end)->conn->xprt && 
__objt_cs(si_b->end)->conn->xprt->snd_pipe) &&


I wasn't too bright when I found it through gdb and forgot to copy 
the backtrace, so I'm hoping that the issue can be found with this 
limited information.


After commenting out the code for TCP splicing with the patch 
attached to the email, then the issue stopped happening.


Have you tried to disable splice via config?

https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#nosplice


Best Regards,

--
Julien Semaan
jsem...@inverse.ca   ::  +1 (866) 353-6153 *155 ::www.inverse.ca
Inverse inc. :: Leaders behind SOGo (www.sogo.nu) and PacketFence 
(www.packetfence.org)


Best regards
aleks