On 2009-May-04 19:26:13 -0700, William Stein <[email protected]> wrote: >Unfortunately, it seems that this does *NOT* mean that if you write a >little C program, spawn 128 threads, and watch them run, then you can >do 128 times what you would do with 1 thread. You still can only do 8 >times as much as with 1 thread. For raw computation, I don't think >that processor is any better than 8 single cores.
Each T2 core provides a single integer ALU and a single FPU (plus a crypto unit which can probably be ignored for this purpose), thus an 8-core T2 only has 8 ALUs. Each core is hardware sliced between 8 threads - giving the 64 threads-per-chip. When a thread stalls (eg waiting for a memory access), the hardware will switch that core to a different thread. For small programs, this doesn't gain you much because everything is cached. You should see decent speedups on code that is cache-busting: If you try running 128 copies of a program to (eg) transpose a 2000x2000 matrix of longs or doubles then you should see better speedup. Note that for real-world apps, kernel and serialisation overheads can seriously hurt you. BTW, anyone looking at using a T1 should be aware that it only has a single FPU shared by all cores. This means FP performance is "poor". -- Peter Jeremy
pgpmRviaI7gT8.pgp
Description: PGP signature
