Reduce shuffle data transfer takes excessively long

Sven Groot Thu, 26 Jan 2012 22:25:57 -0800

Hello,


I have been working on profiling the performance of certain parts of Hadoop
0.20.203.0. For this reason, I have set up a simple cluster that uses one
node as the Namenode/Jobtracker, and one node as the sole
Datanode/tasktracker.

 

In this experiment, I run a job consisting of a single map task and a single
reduce task. Both are simply using the default Mapper/Reducer
implementations (the identity functions). The input of the job is a file
with a single 256MB block. Therefore, the output of the map task is 256MB,
and the reduce task must shuffle that 256MB from the local host.

 

To my surprise, shuffling this amount of data takes around 9 seconds, which
is excessively slow. First I turned my attention to the
ReduceTask.ReduceOutputCopier. I determined that about 1.1 seconds is spent
calculating checksums (this is the expected value), and the remaining time
is spent reading from the stream returned by URLConnection.getInputStream().
Some simple tests with URLConnection could not reproduce that issue except
if it was actually reading from the TaskTracker's MapOutputServlet, so the
problem seemed to be on the server side. Reading the same amount of data
from any other local web server takes only 0.2s.

 

I inserted some measurements into the MapOutputServlet and determined that
0.1s was spent reading the intermediate file (unsurprising as it was still
in the page cache) and 7.7s are spent writing to the stream returned by
response.getOutputStream(). The slowdown therefore appears to be in Jetty.

 

CPU usage during the transfer appears to be low, so it feels like the
transfer is getting throttled somehow. But if that's the case I can't figure
out how that's happening. There's nothing in the source code to lead me to
believe Hadoop is deliberately throttling anything, and as far as I know
Jetty doesn't throttle by default.

 

I was seeing some warnings in the tasktracker log file related to this:
http://wiki.eclipse.org/Jetty/Feature/JVM_NIO_Bug However, running Hadoop
under Java 7 made those warnings disappear and the transfer is still slow,
so I don't think that's it.

 

I'm out of ideas as to what could be causing this. Any insights?

 

Regards,

Sven

Reduce shuffle data transfer takes excessively long

Reply via email to