I've create a jira describing my problems running under IsolationRunner. https://issues.apache.org/jira/browse/HADOOP-4041
If anyone is using I.R. successfully to re-run failed tasks in a single JVM, can you please, pretty please, describe on how you do that? Thank you, -Yuri On Friday 08 August 2008 10:09:48 Yuri Pradkin wrote: > On Thursday 07 August 2008 16:43:10 John Heidemann wrote: > > On Thu, 07 Aug 2008 19:42:05 +0200, "Leon Mergen" wrote: > > >Hello John, > > > > > >On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann <[EMAIL PROTECTED]> wrote: > > >> I have a large Hadoop streaming job that generally works fine, > > >> but a few (2-4) of the ~3000 maps and reduces have problems. > > >> To make matters worse, the problems are system-dependent (we run an a > > >> cluster with machines of slightly different OS versions). > > >> I'd of course like to debug these problems, but they are embedded in a > > >> large job. > > >> > > >> Is there a way to extract the input given to a reducer from a job, > > >> given the task identity? (This would also be helpful for mappers.) > > > > > >I believe you should set "keep.failed.tasks.files" to true -- this way, > > > give a task id, you can see what input files it has in ~/ > > >taskTracker/${taskid}/work (source: > > >http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Isolatio > > >nR unner ) > > IsolationRunner does not work as described in the tutorial. After the task > hung, I failed it via the web interface. Then I went to the node that was > running this task > > $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work > (this path is already different from the tutorial's) > > $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml > Exception in thread "main" java.lang.NullPointerException > at > org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164) > > Looking at IsolationRunner code, I see this: > > 164 File workDirName = new File(lDirAlloc.getLocalPathToRead( > 165 TaskTracker.getJobCacheSubdir() > 166 + Path.SEPARATOR + > taskId.getJobID() 167 + Path.SEPARATOR + > taskId 168 + Path.SEPARATOR + "work", 169 > conf). toString()); > > I.e. it assumes there is supposed to be a taskID subdirectory under the job > dir, but: > $ pwd > ...mapred/local/taskTracker/jobcache/job_200808071645_0001 > $ ls > jars job.xml work > > -- it's not there. Any suggestions? > > Thanks, > > -Yuri