Hi, I ran a Hadoop MapReduce task in the local mode, reading and writing from HDFS, and it took 2.5 minutes. Essentially the same operations on the local file system without MapReduce took 1/2 minute. Is this to be expected?
It seemed that the system lost most of the time in the MapReduce operation, such as after these messages 09/04/19 23:23:01 INFO mapred.LocalJobRunner: reduce > reduce 09/04/19 23:23:01 INFO mapred.JobClient: map 100% reduce 92% 09/04/19 23:23:04 INFO mapred.LocalJobRunner: reduce > reduce it waited for a long time. The final output lines were 09/04/19 23:24:12 INFO mapred.LocalJobRunner: reduce > reduce 09/04/19 23:24:12 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done. 09/04/19 23:24:12 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost/output 09/04/19 23:24:13 INFO mapred.JobClient: Job complete: job_local_0001 09/04/19 23:24:13 INFO mapred.JobClient: Counters: 13 09/04/19 23:24:13 INFO mapred.JobClient: File Systems 09/04/19 23:24:13 INFO mapred.JobClient: HDFS bytes read=138103444 09/04/19 23:24:13 INFO mapred.JobClient: HDFS bytes written=107357785 09/04/19 23:24:13 INFO mapred.JobClient: Local bytes read=282509133 09/04/19 23:24:13 INFO mapred.JobClient: Local bytes written=376697552 09/04/19 23:24:13 INFO mapred.JobClient: Map-Reduce Framework 09/04/19 23:24:13 INFO mapred.JobClient: Reduce input groups=184 09/04/19 23:24:13 INFO mapred.JobClient: Combine output records=185 09/04/19 23:24:13 INFO mapred.JobClient: Map input records=209 09/04/19 23:24:13 INFO mapred.JobClient: Reduce output records=184 09/04/19 23:24:13 INFO mapred.JobClient: Map output bytes=91863989 09/04/19 23:24:13 INFO mapred.JobClient: Map input bytes=69051592 09/04/19 23:24:13 INFO mapred.JobClient: Combine input records=185 09/04/19 23:24:13 INFO mapred.JobClient: Map output records=209 09/04/19 23:24:13 INFO mapred.JobClient: Reduce input records=184