I'm not sure how to interpret this, but it appears that haproxy is dropping
client payload intermittently (1/100).  I have included tcpdumps and logs to
show what is happening.

Am I doing something wrong?  I have no idea what could be causing this or
how
to go about debugging it.  I cannot reproduce it, but I do observe in
production ~2 times
a day across 20 instances and 2K connections.

Any help or advice would be greatly appreciated.



What I'm trying to accomplish is to provide HA availability over two routes
(i.e. internet providers).  One acts as primary and I gave it a "static-rr"
"weight" of 256 and the other as backup and has a weight of "1".  Backup
should only be used in case of primary failure.


log:
Apr  4 18:55:27 app055 haproxy[13666]: 127.0.0.1:42262
[04/Apr/2017:18:54:41.585] ws-local servers/server1 1/86/45978 4503 5873 --
0/0/0/0/0 0/0
Apr  4 22:46:37 app055 haproxy[13666]: 127.0.0.1:47130
[04/Apr/2017:22:46:36.931] ws-local servers/server1 1/62/663 7979 517 --
0/0/0/0/0 0/0
Apr  4 22:46:38 app055 haproxy[13666]: 127.0.0.1:32931
[04/Apr/2017:22:46:37.698] ws-local servers/server1 1/55/405 3062 553 --
1/1/1/1/0 0/0
Apr  4 22:46:43 app055 haproxy[13666]: 127.0.0.1:41748
[04/Apr/2017:22:46:43.190] ws-local servers/server1 1/115/452 7979 517 --
2/2/2/2/0 0/0
Apr  4 22:46:46 app055 haproxy[13666]: 127.0.0.1:57226
[04/Apr/2017:22:46:43.576] ws-local servers/server1 1/76/3066 2921 538 --
1/1/1/1/0 0/0
Apr  4 22:46:47 app055 haproxy[13666]: 127.0.0.1:39656
[04/Apr/2017:22:46:47.072] ws-local servers/server1 1/67/460 8254 528 --
1/1/1/1/0 0/0
Apr  4 22:47:38 app055 haproxy[13666]: 127.0.0.1:39888
[04/Apr/2017:22:46:38.057] ws-local servers/server1 1/63/60001 0 0 cD
0/0/0/0/0 0/0
Apr  5 08:44:55 app055 haproxy[13666]: 127.0.0.1:42650
[05/Apr/2017:08:44:05.529] ws-local servers/server1 1/53/49645 4364 4113 --
0/0/0/0/0 0/0


tcpdump:
22:46:38.057127 IP 127.0.0.1.39888 > 127.0.0.1.9011: Flags [S], seq
2113072542, win 43690, options [mss 65495,sackOK,TS val 82055529 ecr
0,nop,wscale 7], length 0
22:46:38.057156 IP 127.0.0.1.9011 > 127.0.0.1.39888: Flags [S.], seq
3284611992, ack 2113072543, win 43690, options [mss 65495,sackOK,TS val
82055529 ecr 82055529,nop,wscale 7], length 0
22:46:38.057178 IP 127.0.0.1.39888 > 127.0.0.1.9011: Flags [.], ack 1, win
342, options [nop,nop,TS val 82055529 ecr 82055529], length 0
22:46:38.057295 IP 10.10.10.10.34289 > 99.99.99.99.8000: Flags [S], seq
333335567, win 29200, options [mss 1460,sackOK,TS val 82055529 ecr
0,nop,wscale 7], length 0
22:46:38.060539 IP 127.0.0.1.39888 > 127.0.0.1.9011: Flags [P.], seq 1:199,
ack 1, win 342, options [nop,nop,TS val 82055530 ecr 82055529], length 198
22:46:38.060598 IP 127.0.0.1.9011 > 127.0.0.1.39888: Flags [.], ack 199,
win 350, options [nop,nop,TS val 82055530 ecr 82055530], length 0
... client payload acked ...
22:46:38.120527 IP 99.99.99.99.8000 > 10.10.10.10.34289: Flags [S.], seq
4125907118, ack 333335568, win 28960, options [mss 1460,sackOK,TS val
662461622 ecr 82055529,nop,wscale 8], length 0
22:46:38.120619 IP 10.10.10.10.34289 > 99.99.99.99.8000: Flags [.], ack 1,
win 229, options [nop,nop,TS val 82055545 ecr 662461622], length 0
... idle timeout by server 5 seconds later...
22:46:43.183207 IP 99.99.99.99.8000 > 10.10.10.10.34289: Flags [F.], seq 1,
ack 1, win 114, options [nop,nop,TS val 662466683 ecr 82055545], length 0
22:46:43.183387 IP 127.0.0.1.9011 > 127.0.0.1.39888: Flags [F.], seq 1, ack
199, win 350, options [nop,nop,TS val 82056810 ecr 82055530], length 0
22:46:43.184011 IP 10.10.10.10.34289 > 99.99.99.99.8000: Flags [.], ack 2,
win 229, options [nop,nop,TS val 82056811 ecr 662466683], length 0
22:46:43.184025 IP 127.0.0.1.39888 > 127.0.0.1.9011: Flags [.], ack 2, win
342, options [nop,nop,TS val 82056811 ecr 82056810], length 0
22:46:43.184715 IP 127.0.0.1.39888 > 127.0.0.1.9011: Flags [P.], seq
199:206, ack 2, win 342, options [nop,nop,TS val 82056811 ecr 82056810],
length 7
22:46:43.184795 IP 127.0.0.1.9011 > 127.0.0.1.39888: Flags [.], ack 206,
win 350, options [nop,nop,TS val 82056811 ecr 82056811], length 0
22:46:43.184849 IP 127.0.0.1.39888 > 127.0.0.1.9011: Flags [F.], seq 206,
ack 2, win 342, options [nop,nop,TS val 82056811 ecr 82056811], length 0
22:46:43.184877 IP 127.0.0.1.9011 > 127.0.0.1.39888: Flags [.], ack 207,
win 350, options [nop,nop,TS val 82056811 ecr 82056811], length 0
22:47:38.058683 IP 10.10.10.10.34289 > 99.99.99.99.8000: Flags [F.], seq 1,
ack 2, win 229, options [nop,nop,TS val 82070529 ecr 662466683], length 0
22:47:38.116336 IP 99.99.99.99.8000 > 10.10.10.10.34289: Flags [R], seq
4125907120, win 0, length 0


config:
global
    daemon
    maxconn 10
    log /dev/log local0
    stats socket /dev/shm/haproxy.sock mode 666 level admin

defaults
    log global
    option tcplog
    log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %U %ts
%ac/%fc/%bc/%sc/%rc %sq/%bq"
    option  log-health-checks
    option redispatch
    mode tcp
    retries 3
    timeout check 900ms
    timeout connect 500ms
    timeout queue 2s
    timeout client 60000ms
    timeout server 60000ms

resolvers mydns1
    nameserver dns2 8.8.4.4:53
    resolve_retries       50000
    timeout retry         5s
    hold other           30s
    hold refused         30s
    hold nx              30s
    hold timeout         30s
    hold valid           10s

resolvers mydns2
    nameserver dns3 172.31.0.254:53
    resolve_retries      1000
    timeout retry        10s
    hold other           30s
    hold refused         30s
    hold nx              30s
    hold timeout         30s
    hold valid           10s

frontend ws-local
    bind *:9011
    default_backend servers

backend servers
    balance static-rr
    default-server rise 1 inter 1h fastinter 10s downinter 10s error-limit 1
    server server1 ssl.somedomain.com:8000 init-addr 127.0.0.1 check
observe layer4 weight 256 resolvers mydns1
    server server2 ssl.somedomain.com:8000 init-addr 127.0.0.1 check
observe layer4 weight   1 resolvers mydns2 source 172.31.0.1



$ haproxy -vv
HA-Proxy version 1.7.3-1ppa1~trusty 2017/03/01
Copyright 2000-2017 Willy Tarreau <wi...@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
  OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Built with network namespace support

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
        [COMP] compression
        [TRACE] trace
        [SPOE] spoe


--
lfs

Reply via email to