Do you experience the problem with and without native compression? Set hadoop.native.lib to false to disable native compression.
Cheers, Tom On Tue, Feb 24, 2009 at 9:40 PM, Gordon Mohr <[email protected]> wrote: > If you're doing a lot of gzip compression/decompression, you *might* be > hitting this 6+-year-old Sun JVM bug: > > "Instantiating Inflater/Deflater causes OutOfMemoryError; finalizers not > called promptly enough" > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4797189 > > A workaround is listed in the issue: ensuring you call close() or end() on > the Deflater; something similar might apply to Inflater. > > (This is one of those fun JVM situations where having more heap space may > make OOMEs more likely: less heap memory pressure leaves more un-GCd or > un-finalized heap objects around, each of which is holding a bit of native > memory.) > > - Gordon @ IA > > bzheng wrote: >> >> I have about 24k gz files (about 550GB total) on hdfs and has a really >> simple >> java program to convert them into sequence files. If the script's >> setInputPaths takes a Path[] of all 24k files, it will get a OutOfMemory >> error at about 35% map complete. If I make the script process 2k files >> per >> job and run 12 jobs consecutively, then it goes through all files fine. >> The >> cluster I'm using has about 67 nodes. Each nodes has 16GB memory, max 7 >> map, and max 2 reduce. >> >> The map task is really simple, it takes LongWritable as key and Text as >> value, generate a Text newKey, and output.collect(Text newKey, Text >> value). It doesn't have any code that can possibly leak memory. >> >> There's no stack trace for the vast majority of the OutOfMemory error, >> there's just a single line in the log like this: >> >> 2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker: >> java.lang.OutOfMemoryError: Java heap space >> >> I can't find the stack trace right now, but rarely the OutOfMemory error >> originates from some hadoop config array copy opertaion. There's no >> special >> config for the script. >
