[jira] Resolved: (HADOOP-1685) Possible getMapOutput() failures on tasktracker when mapred.reduce.tasks is overriden in job

Owen O'Malley (JIRA) Wed, 08 Aug 2007 23:06:06 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Owen O'Malley resolved HADOOP-1685.
-----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.15.0

This is caused by misunderstanding of the config system in a way that should be 
fixed by HADOOP-785. In particular, hadoop-site.xml does NOT set defaults. The 
values in it are forced onto the client. So by forcing the number of reduces on 
some of the machines, but not others, you got bad results. Basically, the 
number of reduces is a client value that is being overridden by the server's 
configuration file.

That said, a lot of us are looking forward to a lot less confusion once 
HADOOP-785 is done. The current situation is very confusing with lots of 
strange corner cases. 

The proper place to define the default number of maps and reduces is in 
mapred-defaults.xml.


> Possible getMapOutput() failures on tasktracker when mapred.reduce.tasks is 
> overriden in job
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1685
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.1
>         Environment: 6 node cluster, all running on Redhat Enterprise 3.0 
> Standard Server (Update 4), running on java6, 2 nodes are xen virts
>            Reporter: Jorgen Johnson
>            Priority: Minor
>             Fix For: 0.15.0
>
>
> The following error occurs many times on a job where I have defined the 
> number of reduce tasks to be less than the default number of reduce tasks 
> defined in my hadoop-site.xml.    Working off my novice understanding of 
> hadoop infrastructure at this point, it appears that the jobTracker is not 
> honoring the mapred.reduce.tasks as defined in the job-conf, and instead is 
> using the default.
> Map output lost, rescheduling: getMapOutput(task_0010_m_000002_0,6) failed :
> java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at java.io.DataInputStream.readLong(DataInputStream.java:399)
>       at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:1911)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:747)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:860)
>       at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>       at 
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
>       at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>       at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>       at 
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
>       at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>       at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>       at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>       at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>       at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>       at 
> org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
>       at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>       at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> ie.  
> hadoop-site.xml defines mapred.reduce.tasks=7,
> In my job I define mapred.reduce.tasks=3
> I get many errors looking for:
> getMapOutput(task_0010_m_000002_0,3)
> getMapOutput(task_0010_m_000002_0,4)
> getMapOutput(task_0010_m_000002_0,5)
> getMapOutput(task_0010_m_000002_0,6)
> This additional error appears to be a side-effect of the actual problem (it 
> stopped happening when I change the job-conf to match default number of 
> reduce tasks):
> task_0010_m_000016_0: log4j:ERROR Failed to close the task's log with the 
> exception: java.io.IOException: Bad file descriptor
> task_0010_m_000016_0:   at java.io.FileOutputStream.writeBytes(Native Method)
> task_0010_m_000016_0:   at 
> java.io.FileOutputStream.write(FileOutputStream.java:260)
> task_0010_m_000016_0:   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> task_0010_m_000016_0:   at 
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> task_0010_m_000016_0:   at 
> org.apache.hadoop.mapred.TaskLog$Writer.writeIndexRecord(TaskLog.java:251)
> task_0010_m_000016_0:   at 
> org.apache.hadoop.mapred.TaskLog$Writer.close(TaskLog.java:235)
> task_0010_m_000016_0:   at 
> org.apache.hadoop.mapred.TaskLogAppender.close(TaskLogAppender.java:67)
> task_0010_m_000016_0:   at 
> org.apache.log4j.AppenderSkeleton.finalize(AppenderSkeleton.java:124)
> task_0010_m_000016_0:   at 
> java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
> task_0010_m_000016_0:   at 
> java.lang.ref.Finalizer.runFinalizer(Finalizer.java:83)
> task_0010_m_000016_0:   at 
> java.lang.ref.Finalizer.access$100(Finalizer.java:14)
> task_0010_m_000016_0:   at 
> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:160)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-1685) Possible getMapOutput() failures on tasktracker when mapred.reduce.tasks is overriden in job

Reply via email to