Re: load a serialized object in hadoop

Shi Yu Wed, 13 Oct 2010 17:13:39 -0700

Hi, I got it, it should be declared in the
enhadoop-env.sh


export HADOOP_CLIENT_OPTS=-Xmx4000m

Thanks! At the same time I see corrections come in.

Shi

On 2010-10-13 18:18, Shi Yu wrote:

Hi, I tried the following five ways:

Approach 1: in command line
HADOOP_CLIENT_OPTS=-Xmx4000m bin/hadoop jar WordCount.jar OOloadtest
Approach 2: I added the hadoop-site.xml file with the followingelement. Each time I changed, I stop and restart hadoop on all the nodes.
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>-Xmx4000m</value>
</property>

run the command
$bin/hadoop jar WordCount.jar OOloadtest

Approach 3: I changed like this
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000m</value>
</property>
....

Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest

Approach 4: To make sure, I changed the "m" to numbers, that was
...
<property>
<name>HADOOP_CLIENT_OPTS</name>
<value>4000000000</value>
</property>
....

Then run the command:
$bin/hadoop jar WordCount.jar OOloadtest

All these four approaches come to the same "Java heap space" error.

java.lang.OutOfMemoryError: Java heap space
atjava.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
        at java.lang.StringBuilder.<init>(StringBuilder.java:68)
atjava.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2997)atjava.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2818)atjava.io.ObjectInputStream.readString(ObjectInputStream.java:1599)atjava.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)atjava.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at java.util.HashMap.readObject(HashMap.java:1028)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
atjava.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)atjava.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1846)atjava.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1753)atjava.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)atjava.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
        at ObjectManager.loadObject(ObjectManager.java:42)
        at OOloadtest.main(OOloadtest.java:21)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)


Approach 5:
In comparison, I called the Java command directly as follows (there isa counter showing how much time it costs if the serialized object issuccessfully loaded):
$java -Xms3G -Xmx3G -classpath.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar OOloadtest
return:
object loaded, timing (hms): 0 hour(s) 1 minute(s) 12 second(s)162millisecond(s)
What was the problem in my command? Where can I find the documentationabout HADOOP_CLIENT_OPTS? Have you tried the same thing and found itworks?
Shi


On 2010-10-13 16:28, Luke Lu wrote:
On Wed, Oct 13, 2010 at 2:21 PM, Shi Yu<sh...@uchicago.edu>  wrote:
Hi,  thanks for the advice. I tried with your settings,
$ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
still no effect. Or this is a system variable? Should I export it?How to
configure it?
HADOOP_CLIENT_OPTS is an environment variable so you should run it as
HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop jar Test.jar OOloadtest
if you use sh derivative shells (bash, ksh etc.) prepend env forother shells.
__Luke
Shi

  java -Xms3G -Xmx3G -classpath
.:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
OOloadtest


On 2010-10-13 15:28, Luke Lu wrote:
On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu<sh...@uchicago.edu>    wrote:
I haven't implemented anything in map/reduce yet for this issue. Ijust
try
to invoke the same java class using bin/hadoop command. Thething is
a
very simple program could be executed in Java, but not doable in
bin/hadoop
command.
If you are just trying to use bin/hadoop jar your.jar command, your
code runs in a local client jvm and mapred.child.java.opts has no
effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
jar your.jar
I think if I couldn't get through the first stage, even I had a
map/reduce program it would also fail. I am using Hadoop 0.19.2.Thanks.
Best Regards,

Shi

On 2010-10-13 14:15, Luke Lu wrote:
Can you post your mapper/reducer implementation? or are you using
hadoop streaming? for which mapred.child.java.opts doesn't apply to
the jvm you care about. BTW, what's the hadoop version you're using?
On Wed, Oct 13, 2010 at 11:45 AM, Shi Yu<sh...@uchicago.edu>wrote:
Here is my code. There is no Map/Reduce in it. I could run thiscode
using
java -Xmx1000m ,  however, when using  bin/hadoop  -D
mapred.child.java.opts=-Xmx3000M it has heap space not enougherror.
  I
have tried other program in Hadoop with the same settings so thememory
is
available in my machines.


public static void main(String[] args) {
   try{
             String myFile = "xxx.dat";
             FileInputStream fin = new FileInputStream(myFile);
             ois = new ObjectInputStream(fin);
             margintagMap = ois.readObject();
             ois.close();
             fin.close();
     }catch(Exception e){
         //
    }
}

On 2010-10-13 13:30, Luke Lu wrote:
On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu<sh...@uchicago.edu>
  wrote:
As a coming-up to the my own question, I think to invoke theJVM in
Hadoop
requires much more memory than an ordinary JVM.
That's simply not true. The default mapreduce task Xmx is 200M,which
is much smaller than the standard jvm default 512M and most users
don't need to increase it. Please post the code reading theobject (in
hdfs?) in your tasks.
I found that instead of
serialization the object, maybe I could create a MapFile as anindex
to
permit lookups by key in Hadoop. I have also compared theperformance
of
MongoDB and Memcache. I will let you know the result after Itry the
MapFile
approach.

Shi

On 2010-10-12 21:59, M. C. Srivas wrote:
On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu<sh...@uchicago.edu>
  wrote:
Hi,
I want to load a serialized HashMap object in hadoop. Thefile of
stored
object is 200M. I could read that object efficiently inJAVA by
setting
-Xmx
as 1000M. However, in hadoop I could never load it intomemory.
The
code
is
very simple (just read the ObjectInputStream) and there isyet no
map/reduce
implemented. I set the mapred.child.java.opts=-Xmx3000M,still
get
the
"java.lang.OutOfMemoryError: Java heap space"  Could anyone
explain
a
little
bit how memory is allocate to JVM in hadoop. Why hadooptakes up
so
much
memory?  If a program requires 1G memory on a single node, how
much
memory
it requires (generally) in Hadoop?
The JVM reserves swap space in advance, at the time oflaunching the
process. If your swap is too low (or do not have any swap
configured),
you
will hit this.
Or, you are on a 32-bit machine, in which case 3G is notpossible in
the
JVM.

-Srivas.
Thanks.

Shi

--
--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799
--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Re: load a serialized object in hadoop

Reply via email to