Possible getMapOutput() failures on tasktracker when mapred.reduce.tasks is 
overriden in job
--------------------------------------------------------------------------------------------

                 Key: HADOOP-1685
                 URL: https://issues.apache.org/jira/browse/HADOOP-1685
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.13.1
         Environment: 6 node cluster, all running on Redhat Enterprise 3.0 
Standard Server (Update 4), running on java6, 2 nodes are xen virts
            Reporter: Jorgen Johnson
            Priority: Minor


The following error occurs many times on a job where I have defined the number 
of reduce tasks to be less than the default number of reduce tasks defined in 
my hadoop-site.xml.    Working off my novice understanding of hadoop 
infrastructure at this point, it appears that the jobTracker is not honoring 
the mapred.reduce.tasks as defined in the job-conf, and instead is using the 
default.

Map output lost, rescheduling: getMapOutput(task_0010_m_000002_0,6) failed :
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1911)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:747)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:860)
        at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
        at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
        at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

ie.  
hadoop-site.xml defines mapred.reduce.tasks=7,
In my job I define mapred.reduce.tasks=3

I get many errors looking for:
getMapOutput(task_0010_m_000002_0,3)
getMapOutput(task_0010_m_000002_0,4)
getMapOutput(task_0010_m_000002_0,5)
getMapOutput(task_0010_m_000002_0,6)

This additional error appears to be a side-effect of the actual problem (it 
stopped happening when I change the job-conf to match default number of reduce 
tasks):
task_0010_m_000016_0: log4j:ERROR Failed to close the task's log with the 
exception: java.io.IOException: Bad file descriptor
task_0010_m_000016_0:   at java.io.FileOutputStream.writeBytes(Native Method)
task_0010_m_000016_0:   at 
java.io.FileOutputStream.write(FileOutputStream.java:260)
task_0010_m_000016_0:   at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
task_0010_m_000016_0:   at 
java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
task_0010_m_000016_0:   at 
org.apache.hadoop.mapred.TaskLog$Writer.writeIndexRecord(TaskLog.java:251)
task_0010_m_000016_0:   at 
org.apache.hadoop.mapred.TaskLog$Writer.close(TaskLog.java:235)
task_0010_m_000016_0:   at 
org.apache.hadoop.mapred.TaskLogAppender.close(TaskLogAppender.java:67)
task_0010_m_000016_0:   at 
org.apache.log4j.AppenderSkeleton.finalize(AppenderSkeleton.java:124)
task_0010_m_000016_0:   at java.lang.ref.Finalizer.invokeFinalizeMethod(Native 
Method)
task_0010_m_000016_0:   at 
java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
task_0010_m_000016_0:   at java.lang.ref.Finalizer.access$100(Finalizer.java:14)
task_0010_m_000016_0:   at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to