Re: Hadoop Smoke Test: TERASORT

2014-09-10 Thread Rich Haase
You can set the number of reducers used in any hadoop job from the command
line by using -Dmapred.reduce.tasks=XX.

e.g.  hadoop jar hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=10  /terasort-input /terasort-output


Hadoop Smoke Test: TERASORT

2014-09-10 Thread arthur.hk.c...@gmail.com
Hi,

I am trying the smoke test for Hadoop (2.4.1).  About “terasort”, below is my 
test command, the Map part was completed very fast because it was split into 
many subtasks, however the Reduce part takes very long time and only 1 running 
Reduce job.  Is there a way speed up the reduce phase by splitting the large 
reduce job into many smaller ones and run them across the cluster like the Map 
part?


bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar  terasort 
/tmp/teragenout /tmp/terasortout


Job ID  NameState   
Maps Total  Maps Completed  Reduce Total
Reduce Complted
job_1409876705457_0002  TeraSortRUNNING 22352   
22352   1   0


Regards
Arthur