Hi all,

I'm executing one job to convert logs into hive tables. The times are very
good once we have added a proper number of nodes but the reduce phase spends
always more time in one of the machines.

task_201110211442_0086_r_000000<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000>
100.00%
reduce > reduce
23-Oct-2011 00:26:42
23-Oct-2011 00:28:09 (1mins, 27sec)

9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000>
task_201110211442_0086_r_000001<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001>
100.00%
reduce > reduce
23-Oct-2011 00:26:42
23-Oct-2011 00:28:10 (1mins, 27sec)

9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001>
task_201110211442_0086_r_000002<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002>
100.00%
reduce > reduce
23-Oct-2011 00:26:43
23-Oct-2011 00:28:10 (1mins, 27sec)

9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002>
task_201110211442_0086_r_000003<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003>
100.00%
reduce > reduce
23-Oct-2011 00:26:43
23-Oct-2011 00:28:10 (1mins, 27sec)

9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003>
task_201110211442_0086_r_000004<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004>
100.00%
reduce > reduce
23-Oct-2011 00:26:44
23-Oct-2011 00:35:56 (9mins, 11sec)

10<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004>
task_201110211442_0086_r_000005<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005>
100.00%
reduce > reduce
23-Oct-2011 00:26:44
23-Oct-2011 00:28:09 (1mins, 24sec)

9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005>

As you can see in the statistics from 6 reduce executions one is spending 9
minutes while the rest is spending 1 minute. I think that it is because one
of the reducers has to spend time sorting the results from the rest of
nodes.

There is a way to reduce this time?

Thanks in advance,
Raimon Bosch

Reply via email to