2011/6/11 Matt Christiansen <ad...@nikore.net>:
> Thats good to know, while 2000 concurrent connections what we do right
> now, it will be closer to 10,000 concurrent connections come the
> holiday season which is closer to 2.5 GB of ram (still less then whats
> on the server).
>
> One though I have is our requests can be very large at times (big
> headers, super huge cookies), it may not be packet loss that the
> bigger buffer is fixing but a  better ability to buffer our large
> requests. Which might explain why nginx wasn't showing this issue
> where as haproxy was.
>
> We don't have any HP Servers or Broadcom NICs (all Intel). I too have
> had a lot of issues in general with both HP and Broadcom and choose
> hardware for our LB that didn't have those nics.
>
> Our switches are new, but not super high quality (netgears) its
> possible they are not performing as well as we would like, ill have to
> do some more tests on them.

I already experienced some negotiation problems with netgears. Have
you tried to force the media on the nics ?

Cheers
Joris

>
> I'm working on creating a more production like lab where I can test a
> number of different aspects of the LB to see what else I can do in
> terms of performance. I will make lots of use of halog -srv along with
> other tools to measure performance and to see if I can crackdown any
> issues in our current H/W setup.
>
> Thanks for all the help,
>
> Matt C
>
> On Thu, Jun 9, 2011 at 10:20 PM, Willy Tarreau <w...@1wt.eu> wrote:
>> On Thu, Jun 09, 2011 at 04:04:26PM -0700, Matt Christiansen wrote:
>>> I added in the tun.bufsize 65536 and right away things got better, I
>>> doubled that to 131072 and all of the outliers went way. Set at that
>>> with my tests it looks like haproxy is faster then nginx on 95% of
>>> responses and on par with nginx for the last 5% which is fine with me
>>> =).
>>
>> Nice, at least we have a good indication of what may be wrong. I'm
>> pretty sure you're having an important packet loss rate.
>>
>>> What is the negative to setting this high like that? If its just ram
>>> usage all of our LBs have 16GB of ram (don't ask why) so if thats all
>>> I don't think it will be an issue having that so high.
>>
>> Yes it's just an impact on RAM. There are two buffers per connection,
>> so each connection consumes 256kB of RAM in your case. If you do that
>> times 2000 concurrent connections, that's 512MB, which is still small
>> compared to what is present in the machine :-)
>>
>> However, you should *really* try to spot what is causing the issue,
>> because right now you're just hiding it under the carpet, and it's not
>> completely hidden as retransmits still take some time to be sent.
>>
>> Many people have encountered the same problem with Broadcom NetXtreme2
>> network cards, which was particularly marked on those shipped with a
>> lot of HP machines (firmware 1.9.6). The issue was a huge Tx drop rate
>> (which is not reported in netstat). A tcpdump on the machine and another
>> one on the next hop can show that some outgoing packets never reach their
>> destination.
>>
>> It is also possible that one equipment is dying (eg: a switch port) and
>> that the issue will get worse with time.
>>
>> You should pass "halog -srv" on your logs which exhibit the varying
>> times. It will output the average connection times and response times
>> per server. If you see that all servers are affected, you'll conclude
>> that the issue is closer to haproxy. If you see that just a group of
>> servers is affected, you'll conclude that the issue only lies around
>> them (maybe you'll identify a few older servers too).
>>
>> Regards,
>> Willy
>>
>>
>
>

Reply via email to