Hi all, I'm executing one job to convert logs into hive tables. The times are very good once we have added a proper number of nodes but the reduce phase spends always more time in one of the machines.
task_201110211442_0086_r_000000<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000> 100.00% reduce > reduce 23-Oct-2011 00:26:42 23-Oct-2011 00:28:09 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000> task_201110211442_0086_r_000001<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001> 100.00% reduce > reduce 23-Oct-2011 00:26:42 23-Oct-2011 00:28:10 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001> task_201110211442_0086_r_000002<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002> 100.00% reduce > reduce 23-Oct-2011 00:26:43 23-Oct-2011 00:28:10 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002> task_201110211442_0086_r_000003<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003> 100.00% reduce > reduce 23-Oct-2011 00:26:43 23-Oct-2011 00:28:10 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003> task_201110211442_0086_r_000004<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004> 100.00% reduce > reduce 23-Oct-2011 00:26:44 23-Oct-2011 00:35:56 (9mins, 11sec) 10<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004> task_201110211442_0086_r_000005<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005> 100.00% reduce > reduce 23-Oct-2011 00:26:44 23-Oct-2011 00:28:09 (1mins, 24sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005> As you can see in the statistics from 6 reduce executions one is spending 9 minutes while the rest is spending 1 minute. I think that it is because one of the reducers has to spend time sorting the results from the rest of nodes. There is a way to reduce this time? Thanks in advance, Raimon Bosch