Re: Reduce shuffle data transfer takes excessively long

Robert Evans Tue, 31 Jan 2012 13:22:54 -0800

If just changing the buffer to 4k makes a big difference could you at a minimum 
file a JIRA to change that buffer size?  I know that it is not a final fix but 
it sure seems like a very nice Band-Aid to put on until we can get to the root 
of the issues.


--Bobby Evans

On 1/27/12 9:23 PM, "Sven Groot" <sgr...@gmail.com> wrote:

Hi Nick,

Thanks for your reply. I don't think what you are saying is related, as the 
problem happens when the data is transferred; it's not deserialized or anything 
during that step. Note that my code isn't involved at all: it's purely Hadoop's 
own code that's running here.

I have done additional work in trying to find the cause, and it's definitely in 
Jetty. I have created a simple test with Jetty that transfers a file in a 
manner similar to Hadoop, and it shows the same behavior. It appears to be 
linked to the buffer size used by Jetty for chunked transfer encoding. Hadoop 
uses a hardcoded buffer of 64KB for that, which exhibits the problem. If I 
change the buffer to 4KB, Jetty's transfer speed increases by an order of 
magnitude.

I have posted a question on StackOverflow regarding this behavior in Jetty: 
http://stackoverflow.com/questions/9031311/slow-transfers-in-jetty-with-chunked-transfer-encoding-at-certain-buffer-size.
 So far, there are no answers posted.

I've always found it a strange decision that Hadoop uses HTTP to transfer 
intermediate data. Let's just say that this issue reinforces that opinion.

Regards,
Sven


From: Nussbaum, Nick [mailto:nick.nussb...@fticonsulting.com]
Sent: zaterdag 28 januari 2012 3:52
To: sgr...@gmail.com
Subject: FW: Reduce shuffle data transfer takes excessively long


I'm no expert on Hadoop, but I have already encountered a surprising gotcha 
that may be your problem.

If you repeatedly use a function like String getBytes 
<http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#getBytes%28%29>
 () that needs to know a default OS character set it can take a surprisingly 
long time. I speculate this is due to having to go through hoops in various 
sandboxes to read the OS default locale.

If it is the case, getting the system locale and char set once and specifying 
it explicitly in the call to getBytes() or whatever should make a big 
difference.

let me know if it works for you

-Nick




From: Sven Groot [mailto:sgr...@gmail.com]
Sent: Thursday, January 26, 2012 10:25 PM
To: mapreduce-user@hadoop.apache.org
Subject: Reduce shuffle data transfer takes excessively long

Hello,

I have been working on profiling the performance of certain parts of Hadoop 
0.20.203.0. For this reason, I have set up a simple cluster that uses one node 
as the Namenode/Jobtracker, and one node as the sole Datanode/tasktracker.

In this experiment, I run a job consisting of a single map task and a single 
reduce task. Both are simply using the default Mapper/Reducer implementations 
(the identity functions). The input of the job is a file with a single 256MB 
block. Therefore, the output of the map task is 256MB, and the reduce task must 
shuffle that 256MB from the local host.

To my surprise, shuffling this amount of data takes around 9 seconds, which is 
excessively slow. First I turned my attention to the 
ReduceTask.ReduceOutputCopier. I determined that about 1.1 seconds is spent 
calculating checksums (this is the expected value), and the remaining time is 
spent reading from the stream returned by URLConnection.getInputStream(). Some 
simple tests with URLConnection could not reproduce that issue except if it was 
actually reading from the TaskTracker's MapOutputServlet, so the problem seemed 
to be on the server side. Reading the same amount of data from any other local 
web server takes only 0.2s.

I inserted some measurements into the MapOutputServlet and determined that 0.1s 
was spent reading the intermediate file (unsurprising as it was still in the 
page cache) and 7.7s are spent writing to the stream returned by 
response.getOutputStream(). The slowdown therefore appears to be in Jetty.

CPU usage during the transfer appears to be low, so it feels like the transfer 
is getting throttled somehow. But if that's the case I can't figure out how 
that's happening. There's nothing in the source code to lead me to believe 
Hadoop is deliberately throttling anything, and as far as I know Jetty doesn't 
throttle by default.

I was seeing some warnings in the tasktracker log file related to this: 
http://wiki.eclipse.org/Jetty/Feature/JVM_NIO_Bug However, running Hadoop under 
Java 7 made those warnings disappear and the transfer is still slow, so I don't 
think that's it.

I'm out of ideas as to what could be causing this. Any insights?

Regards,
Sven


Confidentiality Notice:
This email and any attachments may be confidential and protected by legal 
privilege. If you are not the intended recipient, be aware that any disclosure, 
copying, distribution or use of the e-mail or any attachment is prohibited. If 
you have received this email in error, please notify us immediately by replying 
to the sender and then delete this copy and the reply from your system. Thank 
you for your cooperation.

Re: Reduce shuffle data transfer takes excessively long

Reply via email to