Since general M/R jobs vary over a huge (Turing problem equivalent!) range of behaviors, a more tractable problem might be to characterize the descriptive parameters needed to answer the question: "If the following problem P runs in T0 amount of time on a certain benchmark platform B0, how long T1 will it take to run on a differently configured real-world platform B1 ?"
Or are you only dealing with one particular M/R job? If so, the above is a good way to look at it: first identify the controlling parameters, then analyze how they co-vary with execution time. Now you've reduced it to a question that can be answered by a series of "make hypothesis" / "do experiment" steps :-) Pick a parameter you think is a likely candidate, and make a series of measurements of execution time for different values of the parameter. Repeat until you've fully characterized the problem space. Good luck, --Matt On Apr 16, 2011, at 6:39 AM, Sonal Goyal wrote: What is your MR job doing? What is the amount of data it is processing? What kind of a cluster do you have? Would you be able to share some details about what you are trying to do? If you are looking for metrics, you could look at the Terasort run .. Thanks and Regards, Sonal <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data Integration<https://github.com/sonalgoyal/hiho> Nube Technologies <http://www.nubetech.co> <http://in.linkedin.com/in/sonalgoyal> On Sat, Apr 16, 2011 at 3:31 PM, real great.. <greatness.hardn...@gmail.com>wrote: > Hi, > As a part of my final year BE final project I want to estimate the time > required by a M/R job given an application and a base file system. > Can you folks please help me by posting some thoughts on this issue or > posting some links here. > > -- > Regards, > R.V. >