Re: load a serialized object in hadoop

Shi Yu Wed, 13 Oct 2010 08:04:20 -0700

As a coming-up to the my own question, I think to invoke the JVM inHadoop requires much more memory than an ordinary JVM. I found thatinstead of serialization the object, maybe I could create a MapFile asan index to permit lookups by key in Hadoop. I have also compared theperformance of MongoDB and Memcache. I will let you know the resultafter I try the MapFile approach.

Shi


On 2010-10-12 21:59, M. C. Srivas wrote:


On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>  wrote:

Hi,

I want to load a serialized HashMap object in hadoop. The file of stored
object is 200M. I could read that object efficiently in JAVA by setting

-Xmx

as 1000M.  However, in hadoop I could never load it into memory. The code

is

very simple (just read the ObjectInputStream) and there is yet no

map/reduce

implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get the
"java.lang.OutOfMemoryError: Java heap space"  Could anyone explain a

little

bit how memory is allocate to JVM in hadoop. Why hadoop takes up so much
memory?  If a program requires 1G memory on a single node, how much

memory

it requires (generally) in Hadoop?

The JVM reserves swap space in advance, at the time of launching the
process. If your swap is too low (or do not have any swap configured), you
will hit this.

Or, you are on a 32-bit machine, in which case 3G is not possible in the
JVM.

-Srivas.

Thanks.

Shi

--

Re: load a serialized object in hadoop

Reply via email to