> > OK these ones are interesting. First, none of the CLOSE_WAIT connections > in the > netstat file is present in the "show sess" output. Might it be possible > that > these ones are left-overs from a previous run ? Could you please > double-check > that you don't have an older process still running ?
I collect more data. Before 08:24:30 PM, the data is normal, after that I began the benchmark, here is data: https://www.dropbox.com/s/yu2jsirv104ytwp/1.7.5-tcp-2?dl=0 https://www.dropbox.com/s/8dap4kus5sj6by7/1.7.5-sess-2?dl=0 Below is the data during benchmark: *maxsock = *1000036; *maxconn = *500000; *maxpipes = *0 current conns = 7488; current pipes = 0/0; conn rate = 322/sec Running tasks: 7366/7449; idle = 0 % > Second, for an unknown reason, the source address reported in "show sess" > backends shows the CDN instead of haproxy. I double-checked the code to be > sure and it's the correct address that is reported. Are you using some form > of transparent proxying that didn't appear in your first config, like > "usesrc clientip" ? Sorry, it's my fault, I replace the confidential ip by mistake, it should be the haproxy ip > > Third, you have countless SYN_RECV on port 443. There can be two > explanations : > - too small a socket backlog > - high loss rate between the CDN and your haproxy. Even I set backlog as large as 40000, it still soon overflowed. The latency between CDN and haproxy is around 33ms, no packet loss after sending 1000+ icmp packet. > > The previous dumps used to report a lot of FIN_WAIT, indicating a > difficulty > to get ACKs back from the CDN which could thus also fuel the packet loss > hypothesis, but on this test there are much less, about 0.2% which thus > seems > completely normal. However its still important to keep this hypothesis in > mind > as it could also explain your huge number of concurrent connections. > > Regardless, I'm seeing one strange thing. There are a lot of "state=DIS" in > the output. This is a transient state which should not remain for a long > time, > it's used to report an imminent close. It indicates that the client-facing > connection was just terminated, but not executed. > Strangely, these tasks are present in the run queue but were delayed. Would > you happen to build with Clang ? We've faced an integer overflow bug on > very > recent versions. I install the 1.7.5 version via this rpm spec: https://github.com/DBezemer/rpm-haproxy On Mon, Apr 24, 2017 at 6:16 PM, Willy Tarreau <w...@1wt.eu> wrote: > On Mon, Apr 24, 2017 at 05:29:14PM +0800, jaseywang wrote: > > Hi, > > > > Here is the 1.7.5 output with CDN, before 05:22:00 PM with timestamps in > > the file, there is no request, since 05:22:00 PM, I began the benchmark, > so > > you can check from 05:22:00 PM. > > 61.155.222.157 is cdn itself > > > > > > The file is large here is the download link: > > https://www.dropbox.com/s/yrv7l3m8hw32rr9/1.7.5-sess?dl=0 > > https://www.dropbox.com/s/pb7zglhnyovo79f/1.7.5-tcp?dl=0 > > OK these ones are interesting. First, none of the CLOSE_WAIT connections > in the > netstat file is present in the "show sess" output. Might it be possible > that > these ones are left-overs from a previous run ? Could you please > double-check > that you don't have an older process still running ? > > Second, for an unknown reason, the source address reported in "show sess" > backends shows the CDN instead of haproxy. I double-checked the code to be > sure and it's the correct address that is reported. Are you using some form > of transparent proxying that didn't appear in your first config, like > "usesrc clientip" ? > > Third, you have countless SYN_RECV on port 443. There can be two > explanations : > - too small a socket backlog > - high loss rate between the CDN and your haproxy. > > The previous dumps used to report a lot of FIN_WAIT, indicating a > difficulty > to get ACKs back from the CDN which could thus also fuel the packet loss > hypothesis, but on this test there are much less, about 0.2% which thus > seems > completely normal. However its still important to keep this hypothesis in > mind > as it could also explain your huge number of concurrent connections. > > Regardless, I'm seeing one strange thing. There are a lot of "state=DIS" in > the output. This is a transient state which should not remain for a long > time, > it's used to report an imminent close. It indicates that the client-facing > connection was just terminated, but not executed. > > Strangely, these tasks are present in the run queue but were delayed. Would > you happen to build with Clang ? We've faced an integer overflow bug on > very > recent versions. > > Willy >