Number of directories problem in MapReduce operations

Guillaume Smet Wed, 28 Jan 2009 05:30:25 -0800

Hi,

For a few weeks now, we experience a rather annoying problem with a
Nutch/Hadoop installation.


It's a very simple setup: the Hadoop configuration is the default from
Nutch. The version of Hadoop is the hadoop-0.17.1 jar provided by
Nutch.

During the injection operation, we now have the following errors in
one of the MapReduce task:
2009-01-21 00:00:03,344 WARN fs.AllocatorPerContext -
org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create
directory: 
/tmp/hadoop-moteur/mapred/local/taskTracker/jobcache/job_local_1/reduce_3fm7iw/output
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:73)
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:253)
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:298)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at 
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:159)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)

2009-01-21 00:00:03,347 WARN mapred.LocalJobRunner - job_local_1
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
any valid local directory for
taskTracker/jobcache/job_local_1/reduce_3fm7iw/output/map_0.out
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:313)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at 
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:159)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
2009-01-21 00:00:04,211 FATAL crawl.Injector - Injector:
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
        at org.apache.nutch.crawl.Injector.run(Injector.java:190)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Injector.main(Injector.java:180)

The fact is that the directory "reduce_3fm7iw" cannot be created
because the job_local_1/ directory already contains 32k directories
and we are limited by the filesystem.

Is there any way to configure Hadoop to limit the number of
directories created (even if it's slower) or any other solution for
this problem?

I wonder if setting dfs.max.objects is a solution to my problem but
I'm not sure of the consequences it might have.

If additional information are necessary, I'll be glad to provide them.

Thanks.

-- 
Guillaume

Number of directories problem in MapReduce operations

Reply via email to