Thanks Willy, for such a detailed description. It seems m/c configuration
matters more. As I changed the m/c I am getting some improved results.

Thanx

On Mon, Oct 22, 2012 at 12:11 PM, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Vikash,
>
> On Sun, Oct 21, 2012 at 11:20:32PM +0530, freak 62 wrote:
> > What should be the min. configuration of a m/c such that Haproxy running
> on
> > it can hold up to 30K ~ 50K conn/sec for a total of 500000 connections?
>
> You're looking for the high number range here, so you absolutely need to
> run a benchmark on your machine. First, you absolutely need to know the
> average object size so that you can transform the 50kcps to bandwidth.
> You can achieve 50kcps on a gig link if all you're returning is HTTP 304,
> or if you're dealing with massive attacks and just close these connections.
> But if you're transfering more than 1.4 kB of response headers + data, the
> response will be composed of two TCP segments and the gig link will be too
> tight, so you'll need 10G.
>
> Second, 500k connections will eat a lot of memory. Assuming that these
> connections will mostly remain idle (long polling) connections, given
> the ratio you're proposing, we can say that the kernel alone will require
> at least 16kB per connection (4k read+4k write bufs per socket and per
> side). And haproxy can be tuned to use about 17kB with 8kB buffers (for
> normal HTTP traffic), or you can go down to 4kB buffers if you're only
> doing small transfers. Let's stay on the safe side, 16k for the system
> + 17k for haproxy = 33kB per connection. This is 16.5 GB of RAM. You
> definitely need some RAM for the system to work, and I recommend that
> network buffers (kernel+haproxy) don't represent more than 2/3 of the
> system's memory, so you need at least 24 GB of RAM. Let's go to 32 to
> be safe.
>
> You need a massive amount of system tuning too, to be able to support
> 1 million file descriptors.
>
> You need to architecture your site so that haproxy can spread the load
> on enough servers so that the number of source ports does not become
> the limiting factor. Consider 50k usable source ports. You'll need to
> run on 10 servers and have haproxy manage the source ports itself using
> the "source" parameter on each "server" line. If you have less servers,
> then you need to have multiple source addresses on haproxy, or you need
> it to transparently bind to the client's IP address and then run in
> transparent mode and become the default gateway for your servers. This
> also comes with a cost on packet rate.
>
> > I am using Dell Desktop  and configurations is:
> >      Model name:  Intel(R) Core(TM) i7 CPU 930  @ 2.80GHz
> >      No.of processor : 8
> >      Memory : 4GB.
> >
> > Is setting nbproc=8 will ensure that Haproxy will run on 8 cores?
>
> In general yes but it's the system's scheduler which decides. However,
> the more cores you set, the less the performance will be, because moving
> data across CPU caches is extremely inefficient.
>
> In practice, to obtain the highest connection rates, you have to pin
> network IRQs to one core, and haproxy on another core, the closest
> possible to the IRQ one, ideally sharing the same L2 cache, or if not
> possible, the same L3. Don't set it on a core which does not share
> the cache with the first one.
>
> The best performing CPU has the highest frequency and the largest shared
> cache between the two cores in use. For instance, an i7 3770 at 3.4 GHz
> with 8M of shared L3 cache should be nice. And such a CPU can be pushed
> to 3.9 GHz if you limit it to two cores only.
>
> BTW, a Core i7 930 has 4 cores, not 8, so never make your system run
> on more cores than available, it will constantly context-switch and
> the performance will be even lower.
>
> > What other parameters should be set to ensure that Haproxy should not
> > become the bottleneck?
>
> Every detail counts, you absolutely need to run a benchmark. This as
> stupid as network interrupt latency has a huge impact, because depending
> on the process latency, you can see the NIC driver switch to polling mode
> and have one CPU core completely dedicated to ksoftirqd. If the IRQ was
> not correctly pinned to its own core, it means the load will be shared
> with haproxy! You also need to tune your socket buffers and system backlog
> for the average transfer size. Another (stupid) example : some people
> install graphics environments on their servers (very bad idea) and are
> surprized to see low performance. Often this is caused by the GPU using
> shared memory, and introducing important memory access latencies. On my
> laptop for example, I get 10% more network performance by killing X and
> disabling the frame buffer. The GPU then switches to real text mode where
> the memory bandwidth is ridiculous (100kB/s) and fits in the cache (4kB).
>
> There is no one-size-fits-all recipe, you need to run a benchmark.
>
> Regards,
> Willy
>
>

Reply via email to