Re: Performance question

Jean-Daniel Cryans Mon, 20 Apr 2009 06:54:36 -0700

Mark,

Oh sorry, yes you should expect that kind of delay. A tip to optimize
that on big jobs with lots of tasks is to use the
JobConf.setNumTasksToExecutePerJvm(int numTasks) which sets how many
times a JVM can be reused (instead of spawning new ones).


Happy Hadooping!

J-D

On Mon, Apr 20, 2009 at 9:22 AM, Mark Kerzner <markkerz...@gmail.com> wrote:
> Jean-Daniel,
> I realize that, and my question was, is this the normal setup/finishup time,
> about 2 minutes? If it is, then fine. I would expect that on tasks taking
> 10-15 minutes, 2 minutes would be totally justified, and I think that this
> is the guideline - each task should take minutes.
>
> Thank you,
> Mark
>
> On Mon, Apr 20, 2009 at 7:42 AM, Jean-Daniel Cryans 
> <jdcry...@apache.org>wrote:
>
>> Mark,
>>
>> There is a setup price when using Hadoop, for each task a new JVM must
>> be spawned. On such a small scale, you won't see any good using MR.
>>
>> J-D
>>
>> On Mon, Apr 20, 2009 at 12:26 AM, Mark Kerzner <markkerz...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I ran a Hadoop MapReduce task in the local mode, reading and writing from
>> > HDFS, and it took 2.5 minutes. Essentially the same operations on the
>> local
>> > file system without MapReduce took 1/2 minute.  Is this to be expected?
>> >
>> > It seemed that the system lost most of the time in the MapReduce
>> operation,
>> > such as after these messages
>> >
>> > 09/04/19 23:23:01 INFO mapred.LocalJobRunner: reduce > reduce
>> > 09/04/19 23:23:01 INFO mapred.JobClient:  map 100% reduce 92%
>> > 09/04/19 23:23:04 INFO mapred.LocalJobRunner: reduce > reduce
>> >
>> > it waited for a long time. The final output lines were
>> >
>> > 09/04/19 23:24:12 INFO mapred.LocalJobRunner: reduce > reduce
>> > 09/04/19 23:24:12 INFO mapred.TaskRunner: Task
>> > 'attempt_local_0001_r_000000_0' done.
>> > 09/04/19 23:24:12 INFO mapred.TaskRunner: Saved output of task
>> > 'attempt_local_0001_r_000000_0' to hdfs://localhost/output
>> > 09/04/19 23:24:13 INFO mapred.JobClient: Job complete: job_local_0001
>> > 09/04/19 23:24:13 INFO mapred.JobClient: Counters: 13
>> > 09/04/19 23:24:13 INFO mapred.JobClient:   File Systems
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     HDFS bytes read=138103444
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     HDFS bytes written=107357785
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Local bytes read=282509133
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Local bytes
>> written=376697552
>> > 09/04/19 23:24:13 INFO mapred.JobClient:   Map-Reduce Framework
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Reduce input groups=184
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Combine output records=185
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Map input records=209
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Reduce output records=184
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Map output bytes=91863989
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Map input bytes=69051592
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Combine input records=185
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Map output records=209
>> > 09/04/19 23:24:13 INFO mapred.JobClient:     Reduce input records=184
>> >
>>
>

Re: Performance question

Reply via email to