Re: [Beowulf] vmware perfomance

Jim Lux Fri, 15 Feb 2008 15:31:57 -0800

Quoting Mark Hahn <[EMAIL PROTECTED]>, on Fri 15 Feb 2008 02:25:26 PM PST:


I'm skeptical how much sense VM's in HPC make, though.  yes, it would
be nice to have a container for MPI jobs: checkpoints for free, ability
to do
migration.  both these factors depend on the scale of your jobs: if all your
jobs are 4k cpus and up, even a modest node failure rate is going to make
agressive checkpointing necessary (versus jobs averaging 64p which are
almost never taken down by a node failure.)  similarly if your workload
is
all serial jobs, there's probably no need at all for migration (versus
a workload with high variance in job size, length, priority, etc).

Perhaps the added overhead of using VMs to do "user transparentcheckpointing" is worth it in the same sense that most folks arewilling to tolerate the overhead of using a compiler and linkerinstead of working in hex,octal, or binary machine code. Rather thanforce a researcher to figure out how to do checkpointing, you buy afew dozen more nodes to make up for the extra work.

You spend more on hardware and less on bodies, and since the hardwareis always getting cheaper (per quanta of "work") the trade gets moreattractive with time.

{Leaving aside interesting philosophical discussions having to do withincremental cost of labor, especially ones own, vs capital andoperating costs of the iron. I've also noticed that even though we'vegone through many many Moore's Law doublings, with, probably a 5000fold increase in computational horsepower on an engineer's desk every20 years, design and analysis methodologies change much slower. Inthe RF world, state of the art in design tools in 1960 was a paperSmith chart and a slide rule, and a healthy dose of simplifiedanalytical approximations. State of the art in 1980 was simplecomputer tools that essentially automated the pencil and papertechniques, as well as some numerical analysis things (e.g. SPICE forcircuit simulation, which solves matrix equations and does numericalintegration, or early electromagnetics codes) State of the art in2000 (and today, really) is integrated modeling tools with much largermatrices and tighter integration between FEM codes and circuit theorytype analysis (that is, you might model the packaging with an EM codebut you'd use a behavioral model for the semiconductor device, ratherthan using Maxwell's equations all the way down to the atomic level)

However, even with such nifty tools, a huge number of engineers stilluse paper and pencil style analysis. Granted, they use Excel insteadof their trusty HP45 and a quad pad.. but the style of analysis anddesign is the same. They even teach classes in "RF Design with Excel"(which I view as anathema) Why isn't everyone using the new tools(which hugely improve productivity and quality of the resulting design)?

Capital investment is required (gotta invest in the iron, and the seatlicense)Familiarity (if you learned to design 20 years ago, you're comfortablewith the methodology, you're aware of the limits, and you aresatisfied with the precision and accuracy of the results of thatmethodology...)The latter is another aspect of capital investment.. it takes time toget used to a new way of doing things, time that the engineer may nothave, in an environment that stresses getting the product out the door(or, in the case of where I work, getting to the launch pad in timefor the every two year launch opportunity for Mars).

So, against this background, giving up even 80% of the computationalhorsepower, in exchange for allowing one to use a tool that might makeyou 10 times more productive is a good trade. Sometimes, I think thatfolks developing automatic parallelizers and similar tools are workingtoo hard to make it perfect. If I can take a chunk of software thattakes, say, 1 day (requiring periodic interactions, e.g., it's not abatch overnight thing) to run now, and get it to run in 10 minutes,that's a huge improvement. Put it in numbers. Say it costs me $3000for a computer to run it in a day. If I can run it in 10 minutes(e.g. about 50 times faster), and I do one run a day, I don't care ifit takes 100 processors to run 50x faster, as opposed to only 50. Theextra 50 processors costs me, say, $200K (extra overhead forconnectivity, facilities, etc.), which is a small fraction of the timesaved, because I've essentially replaced 50 engineers with 1. (puttingthose 49 engineers out on the street, where they will inevitably causeproblems..idle hands, playgrounds, and so forth)

In fact, you could have some hideously inefficient scheme that takes1000 processors to go 10 times faster, and it's probably still a gooddeal.


Jim Lux





_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] vmware perfomance

Reply via email to