It's important to read the documentation for the NAS benchmarks
before using them.  Pierre's only half right - communication here
IS the problem, but because of the latency, not the throughput.

The NAS benchmarks have three different class sizes of problems,
A, B, and C (well, there is a fourth, W, but ignore that for now ;)
A is the smallest benchmark size - i.e., the smallest amount of
work to be done.  Divide that by 64 CPUs and you have a little
computation and a LOT of communication.  On the other hand, C
is the _largest_ benchmark size; divide that by 64 CPUs and you
have a LOT of computation relative to the amount of communication
that you do, so here your speedups should be good.

My suggestion would be that you try a class B or class C size
problem before reporting speedup results and complaining that
the speedups are small for a 64 node cluster.

Granted, there is a limit to how much speedup can be obtained
(search for "Amdahl's Law"...), even with excellent hardware
(and I'm talking better hardware than Myrinet here, folks), but
for C sizes CG should at least scale nearly linearly for 
the node ranges you're talking about.

-Bob

  ----- Forwarded message from Pierre Brua -----

  [Charset iso-8859-1 unsupported, filtering to ASCII...]
  Eric Roman wrote:
  > But when
  > we started doing 64 processor runs we got some horrible performance.
  > Here's an example:
  > 
  >  Name Class   NC      Time      Mop/s Mop/s/proc Version Filename
  >  CG   A        1     65.91      22.71      22.71     2.3 cg.A.1.egcs3-t3
  ...
  >  CG   A       64     28.72      52.11       0.81     2.3 cg.A.64.egcs3
  > 
  > Speedup of 2 for 64 processors?  Does this make any sense whatsoever?

          Those benchmarks are used to test the network speed of millions dollars
  parallel computers that usually have specially tuned network cards and
  protocols. And 100Mb ethernet+TCP/IP network is awful for that because
  it has _not_ been designed for it : if an ethernet packet meets another
  one on the wire, the ethernet protocols automatically delays new packets
  for a random-calculated amound of time. And Ethernet performance drops
  to less than 20% of the maximum throughput if you overload it.
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]

Reply via email to