Hi list, I have jobs that generate huge amount of intermediate data. For eg: One of my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1 master.
My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but normal file transfers between my datanodes generally go up to 40-50 MBps. Why is my shuffle speed so slow? Also how is that number calculated ? What exactly does that signify? (Is it the avg speed of all mappers to that particular reducer? or anything else?) Any suggestions? Thanks