Possible getMapOutput() failures on tasktracker when mapred.reduce.tasks is overriden in job --------------------------------------------------------------------------------------------
Key: HADOOP-1685 URL: https://issues.apache.org/jira/browse/HADOOP-1685 Project: Hadoop Issue Type: Bug Components: mapred Affects Versions: 0.13.1 Environment: 6 node cluster, all running on Redhat Enterprise 3.0 Standard Server (Update 4), running on java6, 2 nodes are xen virts Reporter: Jorgen Johnson Priority: Minor The following error occurs many times on a job where I have defined the number of reduce tasks to be less than the default number of reduce tasks defined in my hadoop-site.xml. Working off my novice understanding of hadoop infrastructure at this point, it appears that the jobTracker is not honoring the mapred.reduce.tasks as defined in the job-conf, and instead is using the default. Map output lost, rescheduling: getMapOutput(task_0010_m_000002_0,6) failed : java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1911) at javax.servlet.http.HttpServlet.service(HttpServlet.java:747) at javax.servlet.http.HttpServlet.service(HttpServlet.java:860) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) ie. hadoop-site.xml defines mapred.reduce.tasks=7, In my job I define mapred.reduce.tasks=3 I get many errors looking for: getMapOutput(task_0010_m_000002_0,3) getMapOutput(task_0010_m_000002_0,4) getMapOutput(task_0010_m_000002_0,5) getMapOutput(task_0010_m_000002_0,6) This additional error appears to be a side-effect of the actual problem (it stopped happening when I change the job-conf to match default number of reduce tasks): task_0010_m_000016_0: log4j:ERROR Failed to close the task's log with the exception: java.io.IOException: Bad file descriptor task_0010_m_000016_0: at java.io.FileOutputStream.writeBytes(Native Method) task_0010_m_000016_0: at java.io.FileOutputStream.write(FileOutputStream.java:260) task_0010_m_000016_0: at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) task_0010_m_000016_0: at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) task_0010_m_000016_0: at org.apache.hadoop.mapred.TaskLog$Writer.writeIndexRecord(TaskLog.java:251) task_0010_m_000016_0: at org.apache.hadoop.mapred.TaskLog$Writer.close(TaskLog.java:235) task_0010_m_000016_0: at org.apache.hadoop.mapred.TaskLogAppender.close(TaskLogAppender.java:67) task_0010_m_000016_0: at org.apache.log4j.AppenderSkeleton.finalize(AppenderSkeleton.java:124) task_0010_m_000016_0: at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method) task_0010_m_000016_0: at java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83) task_0010_m_000016_0: at java.lang.ref.Finalizer.access$100(Finalizer.java:14) task_0010_m_000016_0: at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.