Possible getMapOutput() failures on tasktracker when mapred.reduce.tasks is
overriden in job
--------------------------------------------------------------------------------------------
Key: HADOOP-1685
URL: https://issues.apache.org/jira/browse/HADOOP-1685
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.13.1
Environment: 6 node cluster, all running on Redhat Enterprise 3.0
Standard Server (Update 4), running on java6, 2 nodes are xen virts
Reporter: Jorgen Johnson
Priority: Minor
The following error occurs many times on a job where I have defined the number
of reduce tasks to be less than the default number of reduce tasks defined in
my hadoop-site.xml. Working off my novice understanding of hadoop
infrastructure at this point, it appears that the jobTracker is not honoring
the mapred.reduce.tasks as defined in the job-conf, and instead is using the
default.
Map output lost, rescheduling: getMapOutput(task_0010_m_000002_0,6) failed :
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1911)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:747)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:860)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
ie.
hadoop-site.xml defines mapred.reduce.tasks=7,
In my job I define mapred.reduce.tasks=3
I get many errors looking for:
getMapOutput(task_0010_m_000002_0,3)
getMapOutput(task_0010_m_000002_0,4)
getMapOutput(task_0010_m_000002_0,5)
getMapOutput(task_0010_m_000002_0,6)
This additional error appears to be a side-effect of the actual problem (it
stopped happening when I change the job-conf to match default number of reduce
tasks):
task_0010_m_000016_0: log4j:ERROR Failed to close the task's log with the
exception: java.io.IOException: Bad file descriptor
task_0010_m_000016_0: at java.io.FileOutputStream.writeBytes(Native Method)
task_0010_m_000016_0: at
java.io.FileOutputStream.write(FileOutputStream.java:260)
task_0010_m_000016_0: at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
task_0010_m_000016_0: at
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
task_0010_m_000016_0: at
org.apache.hadoop.mapred.TaskLog$Writer.writeIndexRecord(TaskLog.java:251)
task_0010_m_000016_0: at
org.apache.hadoop.mapred.TaskLog$Writer.close(TaskLog.java:235)
task_0010_m_000016_0: at
org.apache.hadoop.mapred.TaskLogAppender.close(TaskLogAppender.java:67)
task_0010_m_000016_0: at
org.apache.log4j.AppenderSkeleton.finalize(AppenderSkeleton.java:124)
task_0010_m_000016_0: at java.lang.ref.Finalizer.invokeFinalizeMethod(Native
Method)
task_0010_m_000016_0: at
java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
task_0010_m_000016_0: at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
task_0010_m_000016_0: at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.