Turing completion isn't the central question here, really.  The truth
is, map-reduce programs have considerably pressure to be written in a
scalable fashion which limits them to fairly simple behaviors that
result in pretty linear dependence of run-time on input size for a
given program.

The cool thing about the paper that I linked to the other day is that
there are enough cues about the expected runtime of the program
available to make good predictions *without* looking at the details.
No doubt the estimation facility could make good use of something as
simple as the hash of the jar in question, but even without that it is
possible to produce good estimates.

I suppose that this means that all of us Hadoop programmers are really
just kind of boring folk.  On average, anyway.

On Sun, Apr 17, 2011 at 12:07 PM, Matthew Foley <ma...@yahoo-inc.com> wrote:
> Since general M/R jobs vary over a huge (Turing problem equivalent!) range of 
> behaviors, a more tractable problem might be to characterize the descriptive 
> parameters needed to answer the question: "If the following problem P runs in 
> T0 amount of time on a certain benchmark platform B0, how long T1 will it 
> take to run on a differently configured real-world platform B1 ?"
>

Reply via email to