Hmm. DEBUG entries are just debug-level detail. The ERROR and WARN level
entries are the problematic ones.

What's your hadoop-site.xml file look like? If you're storing data
underneath ${hadoop.tmp.dir} and that's set to /tmp/${user.name} (as is the
default), then it's possible that a tmpwatch or something is throwing away
Hadoop data if you generate more intermediate results than Linux intends for
you to store in /tmp. You could try changing hadoop.tmp.dir to some other
directory like /home/hadoop/data to see if that's the culprit.

- Aaron

On Fri, Jul 10, 2009 at 8:23 PM, Ian jonhson <jonhson....@gmail.com> wrote:

> On Sat, Jul 11, 2009 at 8:42 AM, Aaron Kimball<aa...@cloudera.com> wrote:
> > Huh. If you look at the JobTracker or TaskTracker log files, do they
> start
> > getting any WARN or ERROR lines around the time jobs start to fail?
>
> oh....  I see a lot of DEBUG information thrown out. It seems something
> wrong
> in hadoop.
>
> The repeated message thrown out in JobTracker log file:
>
> ----------------  dump of screen ----------------
> ...
> 2009-07-10 00:00:03,584 DEBUG org.apache.hadoop.mapred.JobTracker: Got
> heartbeat from:
> tracker_hdt2.hyperdomain:localhost.localdomain/127.0.0.1:60262
> (initialContact: false acceptNewTasks: true) with responseId: 15201
> 2009-07-10 00:00:03,717 DEBUG org.apache.hadoop.mapred.JobTracker: Got
> heartbeat from:
> tracker_hdt0.hyperdomain:localhost.localdomain/127.0.0.1:33338
> (initialContact: false acceptNewTasks: true) with responseId: 13378
> 2009-07-10 00:00:03,718 INFO org.apache.hadoop.mapred.JobTracker:
> Serious problem.  While updating status, cannot find taskid
> attempt_200907051329_0003_r_000000_0
> .....
> ---------------------------------------------------------
>
> And the message repeated thrown out in TaskTracker are:
>
> -------------- dump of screen ------------------
> ...
> 2009-07-10 00:00:03,597 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200907051329_0003/attempt_200907051329_0003_r_000000_0/output/file.out
> in any of the configured local directories
> 2009-07-10 00:00:08,719 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200907051329_0003/attempt_200907051329_0003_r_000000_0/output/file.out
> in any of the configured local directories
> 2009-07-10 00:00:13,720 INFO org.apache.hadoop.mapred.TaskTracker:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>
> taskTracker/jobcache/job_200907051329_0003/attempt_200907051329_0003_r_000000_0/output/file.out
> in any of the configured local directories
> ...
> --------------------------------------------------------
>
> My hadoop is built on three nodes:
>
> hdt0.hypercloud.ict (master, node)
> hdt1.hypercloud.ict (node)
> hdt2.hypercloud.ict (node)
>
> Any help?
>
>
> Ian
>

Reply via email to