Thanks Willy, for such a detailed description. It seems m/c configuration matters more. As I changed the m/c I am getting some improved results.
Thanx On Mon, Oct 22, 2012 at 12:11 PM, Willy Tarreau <w...@1wt.eu> wrote: > Hi Vikash, > > On Sun, Oct 21, 2012 at 11:20:32PM +0530, freak 62 wrote: > > What should be the min. configuration of a m/c such that Haproxy running > on > > it can hold up to 30K ~ 50K conn/sec for a total of 500000 connections? > > You're looking for the high number range here, so you absolutely need to > run a benchmark on your machine. First, you absolutely need to know the > average object size so that you can transform the 50kcps to bandwidth. > You can achieve 50kcps on a gig link if all you're returning is HTTP 304, > or if you're dealing with massive attacks and just close these connections. > But if you're transfering more than 1.4 kB of response headers + data, the > response will be composed of two TCP segments and the gig link will be too > tight, so you'll need 10G. > > Second, 500k connections will eat a lot of memory. Assuming that these > connections will mostly remain idle (long polling) connections, given > the ratio you're proposing, we can say that the kernel alone will require > at least 16kB per connection (4k read+4k write bufs per socket and per > side). And haproxy can be tuned to use about 17kB with 8kB buffers (for > normal HTTP traffic), or you can go down to 4kB buffers if you're only > doing small transfers. Let's stay on the safe side, 16k for the system > + 17k for haproxy = 33kB per connection. This is 16.5 GB of RAM. You > definitely need some RAM for the system to work, and I recommend that > network buffers (kernel+haproxy) don't represent more than 2/3 of the > system's memory, so you need at least 24 GB of RAM. Let's go to 32 to > be safe. > > You need a massive amount of system tuning too, to be able to support > 1 million file descriptors. > > You need to architecture your site so that haproxy can spread the load > on enough servers so that the number of source ports does not become > the limiting factor. Consider 50k usable source ports. You'll need to > run on 10 servers and have haproxy manage the source ports itself using > the "source" parameter on each "server" line. If you have less servers, > then you need to have multiple source addresses on haproxy, or you need > it to transparently bind to the client's IP address and then run in > transparent mode and become the default gateway for your servers. This > also comes with a cost on packet rate. > > > I am using Dell Desktop and configurations is: > > Model name: Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz > > No.of processor : 8 > > Memory : 4GB. > > > > Is setting nbproc=8 will ensure that Haproxy will run on 8 cores? > > In general yes but it's the system's scheduler which decides. However, > the more cores you set, the less the performance will be, because moving > data across CPU caches is extremely inefficient. > > In practice, to obtain the highest connection rates, you have to pin > network IRQs to one core, and haproxy on another core, the closest > possible to the IRQ one, ideally sharing the same L2 cache, or if not > possible, the same L3. Don't set it on a core which does not share > the cache with the first one. > > The best performing CPU has the highest frequency and the largest shared > cache between the two cores in use. For instance, an i7 3770 at 3.4 GHz > with 8M of shared L3 cache should be nice. And such a CPU can be pushed > to 3.9 GHz if you limit it to two cores only. > > BTW, a Core i7 930 has 4 cores, not 8, so never make your system run > on more cores than available, it will constantly context-switch and > the performance will be even lower. > > > What other parameters should be set to ensure that Haproxy should not > > become the bottleneck? > > Every detail counts, you absolutely need to run a benchmark. This as > stupid as network interrupt latency has a huge impact, because depending > on the process latency, you can see the NIC driver switch to polling mode > and have one CPU core completely dedicated to ksoftirqd. If the IRQ was > not correctly pinned to its own core, it means the load will be shared > with haproxy! You also need to tune your socket buffers and system backlog > for the average transfer size. Another (stupid) example : some people > install graphics environments on their servers (very bad idea) and are > surprized to see low performance. Often this is caused by the GPU using > shared memory, and introducing important memory access latencies. On my > laptop for example, I get 10% more network performance by killing X and > disabling the frame buffer. The GPU then switches to real text mode where > the memory bandwidth is ridiculous (100kB/s) and fits in the cache (4kB). > > There is no one-size-fits-all recipe, you need to run a benchmark. > > Regards, > Willy > >