----- Original Message ----- From: Russell Brown <misterr...@gmail.com> Date: Friday, November 4, 2011 9:11 pm Subject: Re: Never ending reduce jobs, error Error reading task outputConnection refused To: mapreduce-user@hadoop.apache.org
> > On 4 Nov 2011, at 15:35, Uma Maheswara Rao G 72686 wrote: > > > This problem may come if you dont configure the hostmappings > properly.> Can you check whether your tasktrackers are pingable > from each other with the configured hosts names? > > > Hi, > Thanks for replying so fast! > > Hostnames? I use IP addresses in the slaves config file, and via > IP addresses everyone can ping everyone else, do I need to set up > hostnames too? Yes, can you configure hostname mappings and check.. > > Cheers > > Russell > > > > Regards, > > Uma > > ----- Original Message ----- > > From: Russell Brown <misterr...@gmail.com> > > Date: Friday, November 4, 2011 9:00 pm > > Subject: Never ending reduce jobs, error Error reading task > outputConnection refused > > To: mapreduce-user@hadoop.apache.org > > > >> Hi, > >> I have a cluster of 4 tasktracker/datanodes and 1 > >> JobTracker/Namenode. I can run small jobs on this cluster fine > >> (like up to a few thousand keys) but more than that and I start > >> seeing errors like this: > >> > >> > >> 11/11/04 08:16:08 INFO mapred.JobClient: Task Id : > >> attempt_201111040342_0006_m_000005_0, Status : FAILED > >> Too many fetch-failures > >> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task > >> outputConnection refused > >> 11/11/04 08:16:08 WARN mapred.JobClient: Error reading task > >> outputConnection refused > >> 11/11/04 08:16:13 INFO mapred.JobClient: map 97% reduce 1% > >> 11/11/04 08:16:25 INFO mapred.JobClient: map 100% reduce 1% > >> 11/11/04 08:17:20 INFO mapred.JobClient: Task Id : > >> attempt_201111040342_0006_m_000010_0, Status : FAILED > >> Too many fetch-failures > >> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task > >> outputConnection refused > >> 11/11/04 08:17:20 WARN mapred.JobClient: Error reading task > >> outputConnection refused > >> 11/11/04 08:17:24 INFO mapred.JobClient: map 97% reduce 1% > >> 11/11/04 08:17:36 INFO mapred.JobClient: map 100% reduce 1% > >> 11/11/04 08:19:20 INFO mapred.JobClient: Task Id : > >> attempt_201111040342_0006_m_000011_0, Status : FAILED > >> Too many fetch-failures > >> > >> > >> > >> I have no IDEA what this means. All my nodes can ssh to each > >> other, pass wordlessly, all the time. > >> > >> On the individual data/task nodes the logs have errors like this: > >> > >> 2011-11-04 08:24:42,514 WARN > org.apache.hadoop.mapred.TaskTracker: > >> getMapOutput(attempt_201111040342_0006_m_000015_0,2) failed : > >> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could > not > >> find > taskTracker/vagrant/jobcache/job_201111040342_0006/attempt_201111040342_0006_m_000015_0/output/file.out.index > in any of the configured local directories > >> at > >> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) > >> at > >> > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3543) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > >> at > >> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > >> at > >> > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:816) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > >> at > >> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > >> at > >> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > >> at > >> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > >> at > >> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > >> at > >> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > >> at > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > >> at > >> > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > >> > >> 2011-11-04 08:24:42,514 WARN > org.apache.hadoop.mapred.TaskTracker: > >> Unknown child with bad map output: > >> attempt_201111040342_0006_m_000015_0. Ignored. > >> > >> > >> Are they related? What d any of the mean? > >> > >> If I use a much smaller amount of data I don't see any of these > >> errors and everything works fine, so I guess they are to do > with > >> some resource (though what I don't know?) Looking at > >> MASTERNODE:50070/dfsnodelist.jsp?whatNodes=LIVE > >> I see that datanodes have ample disk space, that isn't it… > >> > >> Any help at all really appreciated. Searching for the errors on > >> Google has me nothing, reading the Hadoop definitive guide as > me > >> nothing. > >> Many thanks in advance > >> > >> Russell > > Regards, Uma