On 2009-May-04 19:26:13 -0700, William Stein <[email protected]> wrote:
>Unfortunately, it seems that this does *NOT* mean that if you write a
>little C program, spawn 128 threads, and watch them run, then you can
>do 128 times what you would do with 1 thread.  You still can only do 8
>times as much as with 1 thread.   For raw computation, I don't think
>that processor is any better than 8 single cores.

Each T2 core provides a single integer ALU and a single FPU (plus a
crypto unit which can probably be ignored for this purpose), thus an
8-core T2 only has 8 ALUs.  Each core is hardware sliced between 8
threads - giving the 64 threads-per-chip.  When a thread stalls (eg
waiting for a memory access), the hardware will switch that core to a
different thread.  For small programs, this doesn't gain you much
because everything is cached.  You should see decent speedups on code
that is cache-busting: If you try running 128 copies of a program to
(eg) transpose a 2000x2000 matrix of longs or doubles then you should
see better speedup.

Note that for real-world apps, kernel and serialisation overheads
can seriously hurt you.

BTW, anyone looking at using a T1 should be aware that it only has a
single FPU shared by all cores.  This means FP performance is "poor".

-- 
Peter Jeremy

Attachment: pgpmRviaI7gT8.pgp
Description: PGP signature

Reply via email to