* Benny Lvfgren <bl-li...@lofgren.biz> [2011-01-07 20:45]:
> On 2011-01-07 19.54, Ted Unangst wrote:
> >>experiment with parallel ports building on a 64-way sparc64 T2 went.
> >>With 32 build jobs it looked like this:
> >><landry_p22>  0.8%Int  48.9%Sys   6.0%Usr   0.0%Nic  44.3%Idle
> >><landry_p22>  around that all the time
> >My understanding is that the T2 is closer to an 8-way machine.  If we
> >could recognize the real cores and balance appropriately, 8 build jobs
> >shouldn't be too bad.
> >At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably
> >well.
> 
> Just to illustrate, a quick test on my 8-core (2 cpu x 4 core)
> Supermicro AMD box (compile a GENERIC.MP kernel):
> 
> # make clean && make depend
> # time make
> ...
>     3m26.78s real     2m43.73s user     0m35.08s system
> 
> # make clean && make depend
> # time make -j8
> ...
>     0m47.40s real     2m52.75s user     3m1.70s system
> 
> On a first glance it doesn't scale all that well, about 4,4 times
> quicker real time when running eight compiler tasks simultaneously
> compared to the single one.
> 
> But the server isn't idle to begin with (it is run in quite heavy
> production), and this sort of test is of course not processor-only.
> Also, both tests were run with the MP kernel, so even the
> single-task test would probably utilize several kernels at times.

indeed - your test has some flaws. but still, the scaling it shows
isn't all that bad - and keep in mind that cores typically share a bit
more than seperate CPUs. this can have advantages or disadvantages.

the box i have in mind does two things that matter for this
discussion:
-takes backups for/from many servers
-does dns & webalizer on webserver logfiles (many many, from many
 webservers)

the backup sounds I/O-heavy - and of course kinda is. but the biggest
load is gzip. the backup stuff i wrote myself over many years, it has
a nifty scheduler that parallelizes nicely.

the webserver logfile processing suffers from dns latency (local cache
of course, but still). massive massive massive parallel processing (i
wrote that stuff, too) drives it to a point where all CPUs are almost
100% busy (well, see below).

the backup runs for about 3 hours with all CPUs busy. the webserver
logfile thing usually like 2 hours, but only one hour with everything
busy, afterwards only the big logs are still being processed and the
latency is the limiting factor.

the box used to be a dual xeon 2.2 (the older, p4-based heating plate),
with hyperthreading, so 4 logical CPUs with ami RAID 5. the backup
scales almost perfect, more than 3.5x faster with the 4 logical CPUs vs
just one. webserver log processing gives the same picture.

since wednesday it is an intel E7500, 2.93GHz, 2 cores, a sata disk to
boot from and two big sata disks, softraid raid 1. it is slightly
faster than the previous one. pls note that i can only give estimates,
since backup and webserver log processing performance are influenced
by external factors.

and since somebody is going to ask - the seperate boot disk (that
holds OS and everything, just not the raw data) is there to make it
easy to replace the data disks by bigger ones.

so for these tasks, we scale perfectly fine.

throwing more than one cpu (core) at a database server running just
one mysqld instance is not going to help right now. that's likely to
change with rthreads so.

throwing more than one core at a firewall (without much proxy stuff in
userland) hurts more than it helps right now.

guess my point is clear. we scale fine for many (I'd even say the most)
tasks. we scale miserably for some others. yes, our SMP can be
improved, but it isn't bad. heck, what cannot be improved?

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting

Reply via email to