--On Wednesday, February 12, 2003 11:52 AM -0600 Min Xu <[EMAIL PROTECTED]> wrote:

First, I don't think the disk should be bottleneck in any case,
this is because the system has 2GB memory, Solaris's file cache is
able to cache all the file content. top shows the following stats:
The size of memory has nothing to do with the available bandwidth that the memory has. I believe recent Sparcs still only use 133MHz RAM (PC133 at best - Sparc's don't yet use DDR, I think). Since all code pages will most likely not fit entirely in CPU cache, some section has to be read from main memory. IIRC, some versions of UIIIi's have 4MB CPU cache, but I wouldn't be surprised if that's not enough (kernel pages would also have to be counted). (I don't know your specifics here.)

So, if you have 14 processors (I think this is what you said you had), they will all be contending on the memory (~133MHz) bus. The effective memory bus for all of the processors will be 133/14 = ~14MHz. That's a severe bottleneck if main memory is accessed in a critical path for all processes.

All MP Sparcs share the same memory backplane. That's why you hardly ever see performance improvements past 8x CPUs because the memory bandwidth kills you (the CPUs are starved for memory). Moving to a NUMA architecture might help, but I think that's not a feature UltraSparc or Solaris support. (I hear Linux has experimental NUMA support now.)

I'd recommend reading http://www.sunperf.com/perfmontools.html. You should also experiment with mod_mem_cache and mod_disk_cache.

To test the context switching hypothesis and the backplane
hypothesis I changed all files in the repository to 2 bytes long,
that's an "a" plus an "eof". I rerun the experiment, the
performance is poorer!
There will still be overhead in the OS networking layer. You are using connection keep-alives and pipelining, right? The fact that your top output had a lot of kernel time, I'd bet you are spending a lot of time contending on the virtual network (which is usually the case when you are not using connection keep-alives - the TCP stack just gets hammered). I'd bet the local network is not optimized for performance. (DMA can't be used and functionality that could be implemented on dedicated hardware must be done on the main CPU.)

Please stop trying to convince us to pay attention to benchmarks where the client and server are on the same machine. There are just too many variables that will screw things up. The performance characteristics change dramatically when they are physically separate boxes. -- justin


Reply via email to