On Mon, Mar 1, 2010 at 4:13 PM, Darren Govoni <dar...@ontrenet.com> wrote:
> Theoretically. O(n)
>
> All other variables being equal across all nodes
> should...mmmmm.....reduce to n.
>
> That part that really can't be measured is the cost of Hadoop's
> bookkeeping chores as the data set grows since some things in Hadoop
> involve synchronous/serial behavior.
>
> On Mon, 2010-03-01 at 12:27 -0500, Edward Capriolo wrote:
>
>> A previous post to core-user mentioned some formula to determine job
>> time. I was wondering if anyone out there is trying to tackle
>> designing a formula that can calculate the job run time of a
>> map/reduce program. Obviously there are many variables here including
>> but not limited to Disk Speed ,Network Speed, Processor Speed, input
>> data, many constants , data-skew, map complexity, reduce complexity, #
>> of nodes......
>>
>> As an intellectual challenge has anyone starting trying to write a
>> formula that can take into account all these factors and try to
>> actually predict a job time in minutes/hours?
>
>
>

Understood, BIG-0 notation is really not what I am looking for.

Given all variables are the same, a hadoop job on a finite set of data
should run for a finite time. There are parts of the process that run
linear and parts that run in parallel, but there must be a way to
express how long a job actually takes (although admittedly it is very
involved to figure out)

Reply via email to