You might be hitting into the problem of "small-files". This has been discussed multiple times on the list. Greping through archives will help. Also http://www.cloudera.com/blog/2009/02/02/the-small-files-problem/
Ashutosh On Sun, Oct 18, 2009 at 22:57, Kunsheng Chen <ke...@yahoo.com> wrote: > I and running a hadoop program to perform MapReduce work on files inside a > folder. > > My program is basically doing Map and Reduce work, each line of any file is > a pair of string, and the result is a string associate with occurence inside > all files. > > The program works fine until the number of files grow to about 80,000,then > the 'cannot allocate memory' error occur for some reason. > > Each of the file contains around 50 lines, but the total size of all files > is no more than 1.5 GB. There are 3 datanodes performing calculation,each of > them have more than 10GB hd left. > > I am wondering if that is normal for Hadoop because the data is too large ? > Or it might be my programs problem ? > > It is really not supposed to be since Hadoop was developed for processing > large data sets. > > > Any idea is well appreciated > > > > > > > > > >