subject:"Reducer Out of Memory"

Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo

Hi all,
I am running a data-intensive job on 18 nodes on EC2, each with just
1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
it up automatically to 786 map tasks.  This runs fine.  However, I am
setting the reduce task number to 18.  This is where I get a java heap out
of memory error:

java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(String.java:216)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at org.apache.hadoop.io.Text.decode(Text.java:350)
at org.apache.hadoop.io.Text.decode(Text.java:327)
at org.apache.hadoop.io.Text.toString(Text.java:254)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

Re: Reducer Out of Memory

2009-02-11 Thread Rocks Lei Wang

Maybe you need allocate larger vm- memory to use parameter -Xmx1024m

On Thu, Feb 12, 2009 at 10:56 AM, Kris Jirapinyo kjirapi...@biz360.comwrote:

 Hi all,
I am running a data-intensive job on 18 nodes on EC2, each with just
 1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
 it up automatically to 786 map tasks.  This runs fine.  However, I am
 setting the reduce task number to 18.  This is where I get a java heap out
 of memory error:

 java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.(String.java:216)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at org.apache.hadoop.io.Text.decode(Text.java:350)
at org.apache.hadoop.io.Text.decode(Text.java:327)
at org.apache.hadoop.io.Text.toString(Text.java:254)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
at org.apache.hadoop.mapred.Child.main(Child.java:155)

Re: Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo

Darn that send button.

Anyways, so I was wondering if my understanding is correct.  There will only
be the exact same number of output files as the number of reducer tasks I
set.  Thus, in my output directory from the reducer, I should always see
only 18 files.  However, if my understanding is correct, then when I call
the output.collect() in my reducer, does it only get flushed at the end
when that particular reducer task finishes?  If that is the case, then it
does seem like as my input grow, 18 reducers will not be able to handle the
sheer volume of my data, as the collector will keep having to add more and
more data to it.

Thus, I guess this is the question.  Do I have to keep increasing the number
of reduce tasks so that the reducer can take smaller bites out of the
chunk?  Thus, if I'm running out of java heap space and I don't want to add
more nodes, then I need to set my reducer task number to say 36, etc.?  It
just seems like I'm missing something.

Of course, I could always add more nodes or upgrade to a higher instance so
I get more memory, but that's the obvious solution (I just hope it's not the
only solution).  I guess what I'm saying is that I thought the reducer would
be kind of smart enough to know that it's taking too big of a bite out of
the whole chunk (like the mapper) and readjust itself, as I don't really
care how many output files I get in the end, just that the result from the
reducer stays under one directory.


On Wed, Feb 11, 2009 at 6:56 PM, Kris Jirapinyo kjirapi...@biz360.comwrote:

 Hi all,
 I am running a data-intensive job on 18 nodes on EC2, each with just
 1.7GB of memory.  The input size is 50GB, and as a result, my mapper splits
 it up automatically to 786 map tasks.  This runs fine.  However, I am
 setting the reduce task number to 18.  This is where I get a java heap out
 of memory error:

 java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOfRange(Arrays.java:3209)
   at java.lang.String.(String.java:216)
   at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
   at java.nio.CharBuffer.toString(CharBuffer.java:1157)

   at org.apache.hadoop.io.Text.decode(Text.java:350)
   at org.apache.hadoop.io.Text.decode(Text.java:327)
   at org.apache.hadoop.io.Text.toString(Text.java:254)

   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)

   at org.apache.hadoop.mapred.Child.main(Child.java:155)

Re: Reducer Out of Memory

2009-02-11 Thread Kris Jirapinyo

I tried that, but with 1.7GB, that will not allow me to run 1 mapper and 1
reducer concurrently (as I think when you do -Xmx1024m it tries to reserve
that physical memory?).  Thus, to be safe, I set it to -Xmx768m.

The error I get when I do 1024m is this:

java.io.IOException: Cannot run program bash: java.io.IOException:
error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:321)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at 
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2079)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:457)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
allocate memory
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 10 more




On Wed, Feb 11, 2009 at 7:02 PM, Rocks Lei Wang beyiw...@gmail.com wrote:

 Maybe you need allocate larger vm- memory to use parameter -Xmx1024m

 On Thu, Feb 12, 2009 at 10:56 AM, Kris Jirapinyo kjirapi...@biz360.com
 wrote:

  Hi all,
 I am running a data-intensive job on 18 nodes on EC2, each with just
  1.7GB of memory.  The input size is 50GB, and as a result, my mapper
 splits
  it up automatically to 786 map tasks.  This runs fine.  However, I am
  setting the reduce task number to 18.  This is where I get a java heap
 out
  of memory error:
 
  java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOfRange(Arrays.java:3209)
 at java.lang.String.(String.java:216)
 at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
 at java.nio.CharBuffer.toString(CharBuffer.java:1157)
 at org.apache.hadoop.io.Text.decode(Text.java:350)
 at org.apache.hadoop.io.Text.decode(Text.java:327)
 at org.apache.hadoop.io.Text.toString(Text.java:254)
 
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
 at org.apache.hadoop.mapred.Child.main(Child.java:155)

Reducer Out of Memory

Re: Reducer Out of Memory

Re: Reducer Out of Memory

Re: Reducer Out of Memory

4 matches

Site Navigation

Mail list logo

Footer information