[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Kołaczkowski updated MAPREDUCE-4506:
------------------------------------------

    Status: Patch Available  (was: Open)

I attach a patch disabling the 'break connection' feature.
                
> EofException / 'connection reset by peer' while copying map output 
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4506
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 1.0.3
>         Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33
>            Reporter: Piotr Kołaczkowski
>            Priority: Minor
>         Attachments: RamManager.patch, ReduceTask.patch
>
>
> When running complex mapreduce jobs with many mappers and reducers (e.g. 8 
> mappers, 8 reducers on a 8 core machine), sometimes the following exceptions 
> pop up in the logs during the shuffle phase:
> {noformat}
> WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.java 
> (line 3894) getMapOutput(attempt_201207161621_0217_m_000071_0,0) failed :
> org.mortbay.jetty.EofException
>         at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
>         at 
> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:568)
>         at 
> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1005)
>         at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:648)
>         at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:579)
>         at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3872)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>         at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>         at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
>         at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
>         at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
>         at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
>         at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
>         at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
>         at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>         at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>         at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.write0(Native Method)
>         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
>         at sun.nio.ch.IOUtil.write(IOUtil.java:43)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>         at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
>         at 
> org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
>         at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
> {noformat}
> The problem looks like some network problems at first, however it turns out 
> that hadoop shuffleInMemory sometimes deliberately closes map-output-copy 
> connections just to reopen them a few milliseconds later, because of 
> temporary unavailability of free memory. Because the sending side does not 
> expect this, an exception is thrown. Additionally this leads to wasting 
> resources on the sender side, which does more work than required serving 
> additional requests. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to