Mark, Oh sorry, yes you should expect that kind of delay. A tip to optimize that on big jobs with lots of tasks is to use the JobConf.setNumTasksToExecutePerJvm(int numTasks) which sets how many times a JVM can be reused (instead of spawning new ones).
Happy Hadooping! J-D On Mon, Apr 20, 2009 at 9:22 AM, Mark Kerzner <markkerz...@gmail.com> wrote: > Jean-Daniel, > I realize that, and my question was, is this the normal setup/finishup time, > about 2 minutes? If it is, then fine. I would expect that on tasks taking > 10-15 minutes, 2 minutes would be totally justified, and I think that this > is the guideline - each task should take minutes. > > Thank you, > Mark > > On Mon, Apr 20, 2009 at 7:42 AM, Jean-Daniel Cryans > <jdcry...@apache.org>wrote: > >> Mark, >> >> There is a setup price when using Hadoop, for each task a new JVM must >> be spawned. On such a small scale, you won't see any good using MR. >> >> J-D >> >> On Mon, Apr 20, 2009 at 12:26 AM, Mark Kerzner <markkerz...@gmail.com> >> wrote: >> > Hi, >> > >> > I ran a Hadoop MapReduce task in the local mode, reading and writing from >> > HDFS, and it took 2.5 minutes. Essentially the same operations on the >> local >> > file system without MapReduce took 1/2 minute. Is this to be expected? >> > >> > It seemed that the system lost most of the time in the MapReduce >> operation, >> > such as after these messages >> > >> > 09/04/19 23:23:01 INFO mapred.LocalJobRunner: reduce > reduce >> > 09/04/19 23:23:01 INFO mapred.JobClient: map 100% reduce 92% >> > 09/04/19 23:23:04 INFO mapred.LocalJobRunner: reduce > reduce >> > >> > it waited for a long time. The final output lines were >> > >> > 09/04/19 23:24:12 INFO mapred.LocalJobRunner: reduce > reduce >> > 09/04/19 23:24:12 INFO mapred.TaskRunner: Task >> > 'attempt_local_0001_r_000000_0' done. >> > 09/04/19 23:24:12 INFO mapred.TaskRunner: Saved output of task >> > 'attempt_local_0001_r_000000_0' to hdfs://localhost/output >> > 09/04/19 23:24:13 INFO mapred.JobClient: Job complete: job_local_0001 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Counters: 13 >> > 09/04/19 23:24:13 INFO mapred.JobClient: File Systems >> > 09/04/19 23:24:13 INFO mapred.JobClient: HDFS bytes read=138103444 >> > 09/04/19 23:24:13 INFO mapred.JobClient: HDFS bytes written=107357785 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Local bytes read=282509133 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Local bytes >> written=376697552 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Map-Reduce Framework >> > 09/04/19 23:24:13 INFO mapred.JobClient: Reduce input groups=184 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Combine output records=185 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Map input records=209 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Reduce output records=184 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Map output bytes=91863989 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Map input bytes=69051592 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Combine input records=185 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Map output records=209 >> > 09/04/19 23:24:13 INFO mapred.JobClient: Reduce input records=184 >> > >> >