Hi Patrick, Thanks for explanation. I have supply the heapsize in mapper in the following way
-mapper /home/ftpuser1/Nodemapper5.groovy Xmx2000m \ but still same error. Any other idea? Thanks On Mon, Jul 12, 2010 at 6:12 PM, Patrick Angeles <patr...@cloudera.com>wrote: > Shuja, > > Those settings (mapred.child.jvm.opts and mapred.child.ulimit) are only > used > for child JVMs that get forked by the TaskTracker. You are using Hadoop > streaming, which means the TaskTracker is forking a JVM for streaming, > which > is then forking a shell process that runs your groovy code (in another > JVM). > > I'm not much of a groovy expert, but if there's a way you can wrap your > code > around the MapReduce API that would work best. Otherwise, you can just pass > the heapsize in '-mapper' argument. > > Regards, > > - Patrick > > On Mon, Jul 12, 2010 at 4:32 AM, Shuja Rehman <shujamug...@gmail.com> > wrote: > > > Hi Alex, > > > > I have update the java to latest available version on all machines in the > > cluster and now i run the job by adding this line > > > > -D mapred.child.ulimit=3145728 \ > > > > but still same error. Here is the output of this job. > > > > > > root 7845 5674 3 01:24 pts/1 00:00:00 /usr/jdk1.6.0_03/bin/java > > -Xmx10 23m -Dhadoop.log.dir=/usr/lib/hadoop-0.20/logs > > -Dhadoop.log.file=hadoop.log -Dha doop.home.dir=/usr/lib/hadoop-0.20 > > -Dhadoop.id.str= -Dhadoop.root.logger=INFO,co nsole > > -Dhadoop.policy.file=hadoop-policy.xml -classpath > /usr/lib/hadoop-0.20/con > > > > > f:/usr/jdk1.6.0_03/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoo > > > > > p-core-0.20.2+320.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hado > > > > > op-0.20/lib/commons-codec-1.3.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/u > > > > > sr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/com > > > > > mons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/u > > > > > sr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1 > > > > > .jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+320.jar:/usr/lib/hadoo > > > > > p-0.20/lib/hadoop-scribe-log4j-0.20.2+320.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1. > > > > > 8.0.10.jar:/usr/lib/hadoop-0.20/lib/hsqldb.jar:/usr/lib/hadoop-0.20/lib/jackson- > > > > > core-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.0.1.jar:/usr/li > > > > > b/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-run > > > > > time-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/l > > > > > ib/jetty-6.1.14.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hado > > > > > op-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop > > > > > -0.20/lib/libfb303.jar:/usr/lib/hadoop-0.20/lib/libthrift.jar:/usr/lib/hadoop-0. > > > > > 20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/ > > > > > hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/oro- > > > > > 2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0. > > > > > 20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr > > > > > /lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.ja > > r:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar > > org.apache.hadoop.util.RunJar > > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2+320.jar -D > > mapre d.child.java.opts=-Xmx2000M -D mapred.child.ulimit=3145728 > > -inputformat StreamIn putFormat -inputreader > > StreamXmlRecordReader,begin=<mdc xmlns:HTML="http://www.w > > 3.org/TR/REC-xml">,end=</mdc> > > -input /user/root/RNCDATA/MDFDORKUCRAR02/A20100531 > > .0000-0700-0015-0700_RNCCN-MDFDORKUCRAR02 -jobconf mapred.map.tasks=1 > > -jobconf m apred.reduce.tasks=0 -output RNC14 -mapper > > /home/ftpuser1/Nodemapper5.groovy -re ducer > > org.apache.hadoop.mapred.lib.IdentityReducer -file > /home/ftpuser1/Nodemapp > > er5.groovy > > root 7930 7632 0 01:24 pts/2 00:00:00 grep Nodemapper5.groovy > > > > > > Any clue? > > Thanks > > > > On Sun, Jul 11, 2010 at 3:44 AM, Alex Kozlov <ale...@cloudera.com> > wrote: > > > > > Hi Shuja, > > > > > > First, thank you for using CDH3. Can you also check what m* > > > apred.child.ulimit* you are using? Try adding "* > > > -D mapred.child.ulimit=3145728*" to the command line. > > > > > > I would also recommend to upgrade java to JDK 1.6 update 8 at a > minimum, > > > which you can download from the Java SE > > > Homepage<http://java.sun.com/javase/downloads/index.jsp> > > > . > > > > > > Let me know how it goes. > > > > > > Alex K > > > > > > On Sat, Jul 10, 2010 at 12:59 PM, Shuja Rehman <shujamug...@gmail.com > > > >wrote: > > > > > > > Hi Alex > > > > > > > > Yeah, I am running a job on cluster of 2 machines and using Cloudera > > > > distribution of hadoop. and here is the output of this command. > > > > > > > > root 5277 5238 3 12:51 pts/2 00:00:00 > > /usr/jdk1.6.0_03/bin/java > > > > -Xmx1023m -Dhadoop.log.dir=/usr/lib /hadoop-0.20/logs > > > > -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20 > > > > -Dhadoop.id.str= -Dhado op.root.logger=INFO,console > > > > -Dhadoop.policy.file=hadoop-policy.xml -classpath > > > > /usr/lib/hadoop-0.20/conf:/usr/ > > > > > > > > > > > > > > jdk1.6.0_03/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2+320.jar:/usr/lib/hadoo > > > > > > > > > > > > > > p-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.3.jar:/usr/lib/hadoop-0.20/lib/common > > > > > > > > > > > > > > s-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1 > > > > > > > > > > > > > > .0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.ja > > > > > > > > > > > > > > r:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2+320.jar:/usr/l > > > > > > > > > > > > > > ib/hadoop-0.20/lib/hadoop-scribe-log4j-0.20.2+320.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/h > > > > > > > > > > > > > > adoop-0.20/lib/hsqldb.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jackso > > > > > > > > > > > > > > n-mapper-asl-1.0.1.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-ru > > > > > > > > > > > > > > ntime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.14.jar:/usr/lib > > > > > > > > > > > > > > /hadoop-0.20/lib/jetty-util-6.1.14.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0. > > > > > > > > > > > > > > 2.2.jar:/usr/lib/hadoop-0.20/lib/libfb303.jar:/usr/lib/hadoop-0.20/lib/libthrift.jar:/usr/lib/hadoop-0.20/lib > > > > > > > > > > > > > > /log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-jav > > > > > > > > > > > > > > a-5.0.8-bin.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/u > > > > > > > > > > > > > > sr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0 > > > > > > > > > > > > > > .20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api > > > > -2.1.jar org.apache.hadoop.util.RunJar > > > > > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2+320.jar > > > > -D mapred.child.java.opts=-Xmx2000M -inputformat StreamInputFormat > > > > -inputreader StreamXmlRecordReader,begin= <mdc xmlns:HTML=" > > > > http://www.w3.org/TR/REC-xml">,end=</mdc> -input > > > > /user/root/RNCDATA/MDFDORKUCRAR02/A20100531 > > > > .0000-0700-0015-0700_RNCCN-MDFDORKUCRAR02 -jobconf mapred.map.tasks=1 > > > > -jobconf mapred.reduce.tasks=0 -output RNC11 -mapper > > > > /home/ftpuser1/Nodemapper5.groovy -reducer > > > > org.apache.hadoop.mapred.lib.IdentityReducer -file / > > > > home/ftpuser1/Nodemapper5.groovy > > > > root 5360 5074 0 12:51 pts/1 00:00:00 grep > Nodemapper5.groovy > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------------------------------------------------------ > > > > and what is meant by OOM and thanks for helping, > > > > > > > > Best Regards > > > > > > > > > > > > On Sun, Jul 11, 2010 at 12:30 AM, Alex Kozlov <ale...@cloudera.com> > > > wrote: > > > > > > > > > Hi Shuja, > > > > > > > > > > It looks like the OOM is happening in your code. Are you running > > > > MapReduce > > > > > in a cluster? If so, can you send the exact command line your code > > is > > > > > invoked with -- you can get it with a 'ps -Af | grep > > > Nodemapper5.groovy' > > > > > command on one of the nodes which is running the task? > > > > > > > > > > Thanks, > > > > > > > > > > Alex K > > > > > > > > > > On Sat, Jul 10, 2010 at 10:40 AM, Shuja Rehman < > > shujamug...@gmail.com > > > > > >wrote: > > > > > > > > > > > Hi All > > > > > > > > > > > > I am facing a hard problem. I am running a map reduce job using > > > > streaming > > > > > > but it fails and it gives the following error. > > > > > > > > > > > > Caught: java.lang.OutOfMemoryError: Java heap space > > > > > > at Nodemapper5.parseXML(Nodemapper5.groovy:25) > > > > > > > > > > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): > > > subprocess > > > > > > failed with code 1 > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) > > > > > > > > > > > > at > > > > > org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136) > > > > > > at > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) > > > > > > at > > > > > > > > org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) > > > > > > at > > > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > > > > > > > > > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > > > > > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > > > > > > > > > > > > > > > > > I have increased the heap size in hadoop-env.sh and make it > 2000M. > > > Also > > > > I > > > > > > tell the job manually by following line. > > > > > > > > > > > > -D mapred.child.java.opts=-Xmx2000M \ > > > > > > > > > > > > but it still gives the error. The same job runs fine if i run on > > > shell > > > > > > using > > > > > > 1024M heap size like > > > > > > > > > > > > cat file.xml | /root/Nodemapper5.groovy > > > > > > > > > > > > > > > > > > Any clue????????? > > > > > > > > > > > > Thanks in advance. > > > > > > > > > > > > -- > > > > > > Regards > > > > > > Shuja-ur-Rehman Baig > > > > > > _________________________________ > > > > > > MS CS - School of Science and Engineering > > > > > > Lahore University of Management Sciences (LUMS) > > > > > > Sector U, DHA, Lahore, 54792, Pakistan > > > > > > Cell: +92 3214207445 > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards > > > > Shuja-ur-Rehman Baig > > > > _________________________________ > > > > MS CS - School of Science and Engineering > > > > Lahore University of Management Sciences (LUMS) > > > > Sector U, DHA, Lahore, 54792, Pakistan > > > > Cell: +92 3214207445 > > > > > > > > > > > > > > > -- > > Regards > > Shuja-ur-Rehman Baig > > _________________________________ > > MS CS - School of Science and Engineering > > Lahore University of Management Sciences (LUMS) > > Sector U, DHA, Lahore, 54792, Pakistan > > Cell: +92 3214207445 > > > -- Regards Shuja-ur-Rehman Baig _________________________________ MS CS - School of Science and Engineering Lahore University of Management Sciences (LUMS) Sector U, DHA, Lahore, 54792, Pakistan Cell: +92 3214207445