As a coming-up to the my own question, I think to invoke the JVM in Hadoop requires much more memory than an ordinary JVM. I found that instead of serialization the object, maybe I could create a MapFile as an index to permit lookups by key in Hadoop. I have also compared the performance of MongoDB and Memcache. I will let you know the result after I try the MapFile approach.

Shi

On 2010-10-12 21:59, M. C. Srivas wrote:

On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>  wrote:

Hi,

I want to load a serialized HashMap object in hadoop. The file of stored
object is 200M. I could read that object efficiently in JAVA by setting
-Xmx
as 1000M.  However, in hadoop I could never load it into memory. The code
is
very simple (just read the ObjectInputStream) and there is yet no
map/reduce
implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
"java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a
little
bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
memory?  If a program requires 1G memory on a single node, how much
memory
it requires (generally) in Hadoop?
The JVM reserves swap space in advance, at the time of launching the
process. If your swap is too low (or do not have any swap configured), you
will hit this.

Or, you are on a 32-bit machine, in which case 3G is not possible in the
JVM.

-Srivas.



Thanks.

Shi

--




Reply via email to