slow copy makes reduce hang

Rong-en Fan Thu, 18 Sep 2008 12:35:03 -0700

Hi,

I'm using 0.17.2.1 and see a reduce hang in shuffle phase due
to a unresponsive node. From the reduce log (sorry that I didn't
keep it around), it stuck in copying map output from a dead
node (I can not ssh to that one). At that point, all maps are already
finished. I'm wondering why this slowness does not trigger a reduce
task fail and the corresponding map failed (even if it is finished) then
redo the map task on  another node so that the reduce can work.


Thanks,
Rong-En Fan

slow copy makes reduce hang

Reply via email to