Since general M/R jobs vary over a huge (Turing problem equivalent!) range of 
behaviors, a more tractable problem might be to characterize the descriptive 
parameters needed to answer the question: "If the following problem P runs in 
T0 amount of time on a certain benchmark platform B0, how long T1 will it take 
to run on a differently configured real-world platform B1 ?"

Or are you only dealing with one particular M/R job?  If so, the above is a 
good way to look at it: first identify the controlling parameters, then analyze 
how they co-vary with execution time.  Now you've reduced it to a question that 
can be answered by a series of "make hypothesis" / "do experiment" steps :-)  
Pick a parameter you think is a likely candidate, and make a series of 
measurements of execution time for different values of the parameter.  Repeat 
until you've fully characterized the problem space.

Good luck,
--Matt

On Apr 16, 2011, at 6:39 AM, Sonal Goyal wrote:

What is your MR job doing? What is the amount of data it is processing? What
kind of a cluster do you have? Would you be able to share some details about
what you are trying to do?

If you are looking for metrics, you could look at the Terasort run ..

Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Hadoop ETL and Data
Integration<https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>





On Sat, Apr 16, 2011 at 3:31 PM, real great..
<greatness.hardn...@gmail.com>wrote:

> Hi,
> As a part of my final year BE final project I want to estimate the time
> required by a M/R job given an application and a base file system.
> Can you folks please help me by posting some thoughts on this issue or
> posting some links here.
> 
> --
> Regards,
> R.V.
> 

Reply via email to