The log shows that Eclipse is running the job locally: 0/03/28 11:29:16 INFO mapred.*LocalJobRunner*:
Hence the difference in the run times. Normally, Hadoop jobs have a start up time of around 10 seconds atleast. -Abhishek. On Sun, Mar 28, 2010 at 2:10 PM, Something Something < [email protected]> wrote: > When I run a Hadoop job in a command shell it runs in 21 secs, but when I > run the same from within Eclipse IDE it runs in 1 second! Trying to figure > out what the reason is. The output is shown below: > > 1) It appears the FILE_BYTES_READ, HDFS_BYTES_WRITTEN etc values are > totally different. Sounds like that's what is causing the performance > degradation. > 2) I tried setting -Xmx1000m in the command shell but that doesn't help. > Are there other JVM arguments that affect values of these variables > (FILE_BYTES_READ etc)? > > Any hints would be GREATLY appreciated. Thanks. > > > *Command Shell log:* > > INFO: Starting top scores writing job.... > 10/03/28 11:19:49 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 10/03/28 11:19:50 INFO mapred.JobClient: Running job: job_201003281008_0006 > 10/03/28 11:19:51 INFO mapred.JobClient: map 0% reduce 0% > 10/03/28 11:19:59 INFO mapred.JobClient: map 100% reduce 0% > 10/03/28 11:20:04 INFO mapred.JobClient: map 100% reduce 100% > 10/03/28 11:20:10 INFO mapred.JobClient: Job complete: > job_201003281008_0006 > 10/03/28 11:20:10 INFO mapred.JobClient: Counters: 16 > 10/03/28 11:20:10 INFO mapred.JobClient: Job Counters > 10/03/28 11:20:10 INFO mapred.JobClient: Launched reduce tasks=1 > 10/03/28 11:20:10 INFO mapred.JobClient: Launched map tasks=1 > 10/03/28 11:20:10 INFO mapred.JobClient: Data-local map tasks=1 > 10/03/28 11:20:10 INFO mapred.JobClient: FileSystemCounters > *10/03/28 11:20:10 INFO mapred.JobClient: FILE_BYTES_READ=1736 > 10/03/28 11:20:10 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3504 > 10/03/28 11:20:10 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3706 > *10/03/28 11:20:10 INFO mapred.JobClient: Map-Reduce Framework > 10/03/28 11:20:10 INFO mapred.JobClient: Reduce input groups=87 > 10/03/28 11:20:10 INFO mapred.JobClient: Combine output records=0 > 10/03/28 11:20:10 INFO mapred.JobClient: Map input records=100 > 10/03/28 11:20:10 INFO mapred.JobClient: Reduce shuffle bytes=0 > 10/03/28 11:20:10 INFO mapred.JobClient: Reduce output records=100 > 10/03/28 11:20:10 INFO mapred.JobClient: Spilled Records=200 > 10/03/28 11:20:10 INFO mapred.JobClient: Map output bytes=1530 > 10/03/28 11:20:10 INFO mapred.JobClient: Combine input records=0 > 10/03/28 11:20:10 INFO mapred.JobClient: Map output records=100 > 10/03/28 11:20:10 INFO mapred.JobClient: Reduce input records=100 > *Total time for writing scores: 21 secs.* > > > > *Eclipse console log:* > > INFO: Starting top scores writing job.... > 10/03/28 11:29:15 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > 10/03/28 11:29:15 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 10/03/28 11:29:15 WARN mapred.JobClient: No job jar file set. User classes > may not be found. See JobConf(Class) or JobConf#setJar(String). > 10/03/28 11:29:15 INFO mapred.JobClient: Running job: job_local_0006 > 10/03/28 11:29:15 INFO mapred.MapTask: io.sort.mb = 100 > 10/03/28 11:29:16 INFO mapred.MapTask: data buffer = 79691776/99614720 > 10/03/28 11:29:16 INFO mapred.MapTask: record buffer = 262144/327680 > 10/03/28 11:29:16 INFO mapred.MapTask: Starting flush of map output > 10/03/28 11:29:16 INFO mapred.MapTask: Finished spill 0 > 10/03/28 11:29:16 INFO mapred.TaskRunner: > Task:attempt_local_0006_m_000000_0 > is done. And is in the process of commiting > 10/03/28 11:29:16 INFO mapred.LocalJobRunner: > 10/03/28 11:29:16 INFO mapred.TaskRunner: Task > 'attempt_local_0006_m_000000_0' done. > 10/03/28 11:29:16 INFO mapred.LocalJobRunner: > 10/03/28 11:29:16 INFO mapred.Merger: Merging 1 sorted segments > 10/03/28 11:29:16 INFO mapred.Merger: Down to the last merge-pass, with 1 > segments left of total size: 1732 bytes > 10/03/28 11:29:16 INFO mapred.LocalJobRunner: > 10/03/28 11:29:16 INFO mapred.TaskRunner: > Task:attempt_local_0006_r_000000_0 > is done. And is in the process of commiting > 10/03/28 11:29:16 INFO mapred.LocalJobRunner: > 10/03/28 11:29:16 INFO mapred.TaskRunner: Task > attempt_local_0006_r_000000_0 > is allowed to commit now > 10/03/28 11:29:16 INFO output.FileOutputCommitter: Saved output of task > 'attempt_local_0006_r_000000_0' to markers/results > 10/03/28 11:29:16 INFO mapred.LocalJobRunner: reduce > reduce > 10/03/28 11:29:16 INFO mapred.TaskRunner: Task > 'attempt_local_0006_r_000000_0' done. > 10/03/28 11:29:16 INFO mapred.JobClient: map 100% reduce 100% > 10/03/28 11:29:16 INFO mapred.JobClient: Job complete: job_local_0006 > 10/03/28 11:29:16 INFO mapred.JobClient: Counters: 14 > 10/03/28 11:29:16 INFO mapred.JobClient: FileSystemCounters > *10/03/28 11:29:16 INFO mapred.JobClient: FILE_BYTES_READ=95848 > 10/03/28 11:29:16 INFO mapred.JobClient: HDFS_BYTES_READ=277286 > 10/03/28 11:29:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=304886 > 10/03/28 11:29:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=209112 > *10/03/28 11:29:16 INFO mapred.JobClient: Map-Reduce Framework > 10/03/28 11:29:16 INFO mapred.JobClient: Reduce input groups=87 > 10/03/28 11:29:16 INFO mapred.JobClient: Combine output records=0 > 10/03/28 11:29:16 INFO mapred.JobClient: Map input records=100 > 10/03/28 11:29:16 INFO mapred.JobClient: Reduce shuffle bytes=0 > 10/03/28 11:29:16 INFO mapred.JobClient: Reduce output records=100 > 10/03/28 11:29:16 INFO mapred.JobClient: Spilled Records=200 > 10/03/28 11:29:16 INFO mapred.JobClient: Map output bytes=1530 > 10/03/28 11:29:16 INFO mapred.JobClient: Combine input records=0 > 10/03/28 11:29:16 INFO mapred.JobClient: Map output records=100 > 10/03/28 11:29:16 INFO mapred.JobClient: Reduce input records=100 > *Total time for writing scores: 1 secs. > * >
