Re: Newbie: error=24, Too many open files

2008-11-23 Thread Amareshwari Sriramadasu

tim robertson wrote:

Hi all,

I am running MR which is scanning 130M records and then trying to
group them into around 64,000 files.

The Map does the grouping of the record by determining the key, and
then I use a MultipleTextOutputFormat to write the file based on the
key:
@Override
protected String generateFileNameForKeyValue(WritableComparable
key,Writable value, String name) {
return cell_ + key.toString();
}

This approach works for small input files, but for the 130M it fails with:

org.apache.hadoop.mapred.Merger$MergeQueue Down to the last
merge-pass, with 10 segments left of total size: 12291866391 bytes
org.apache.hadoop.mapred.LocalJobRunner$Job reduce  reduce
org.apache.hadoop.mapred.JobClient  map 100% reduce 66%
org.apache.hadoop.mapred.LocalJobRunner$Job reduce  reduce
...
org.apache.hadoop.mapred.LocalJobRunner$Job reduce  reduce
org.apache.hadoop.mapred.LocalJobRunner$Job job_local_0001
java.io.IOException: Cannot run program chmod: error=24, Too many open files
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:317)
at 
org.apache.hadoop.fs.RawLocalFileSystem.execCommand(RawLocalFileSystem.java:540)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:532)
at 
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:284)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:403)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:117)
at 
org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44)
at 
org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:300)
at 
org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:201)
Caused by: java.io.IOException: error=24, Too many open files
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.init(UNIXProcess.java:53)
at java.lang.ProcessImpl.start(ProcessImpl.java:91)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 17 more
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1113)
at 
com.ibiodiversity.index.mapreduce.occurrence.geometry.OccurrenceByPolygonIntersection.splitOccurrenceDataIntoCells(OccurrenceByPolygonIntersection.java:95)
at 
com.ibiodiversity.index.mapreduce.occurrence.geometry.OccurrenceByPolygonIntersection.run(OccurrenceByPolygonIntersection.java:54)
at 
com.ibiodiversity.index.mapreduce.occurrence.geometry.OccurrenceByPolygonIntersection.main(OccurrenceByPolygonIntersection.java:190)


Is this a problem because I am working on my single machine at the
moment, that will go away when I run on the cluster of 25?

  
Yes. The problem could be because of single machine and LocalJobRuner. I 
think this should go away on a cluster.

-Amareshwari

I am configuring the job:
  conf.setNumMapTasks(10);
  conf.setNumReduceTasks(5);

Are there perhaps better parameters so it does not try to manage the
temp files all in one go?

Thanks for helping!

Tim
  




Re: Newbie: error=24, Too many open files

2008-11-23 Thread Jeremy Chow
There are a file number limitation each process can open in unix/linux. The
default number in linux is 1024, you can use

ulimit -n number

to custom this limitation and

ulimit -n

to show this limitation.

Regards,
Jeremy
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: Newbie: error=24, Too many open files

2008-11-23 Thread tim robertson
Thank you Jeremy

I am on Mac (10.5.5) and it is 256 by default.  I will change this and
rerun before running on the cluster.

Thanks again

Tim


On Mon, Nov 24, 2008 at 8:38 AM, Jeremy Chow [EMAIL PROTECTED] wrote:
 There are a file number limitation each process can open in unix/linux. The
 default number in linux is 1024, you can use

 ulimit -n number

 to custom this limitation and

 ulimit -n

 to show this limitation.

 Regards,
 Jeremy
 --
 My research interests are distributed systems, parallel computing and
 bytecode based virtual machine.

 http://coderplay.javaeye.com