[
https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938928#comment-13938928
]
mahmood commented on MAHOUT-1456:
---------------------------------
Please see more outputs of hadoop-2.1.0-beta. I will send the output of
hadoop-1.2.1 later
[hadoop@solaris hadoop-2.1.0-beta]$ ./sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
14/03/18 09:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to
/export/home/hadoop/hadoop-2.1.0-beta/logs/hadoop-hadoop-namenode-solaris.out
localhost: starting datanode, logging to
/export/home/hadoop/hadoop-2.1.0-beta/logs/hadoop-hadoop-datanode-solaris.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to
/export/home/hadoop/hadoop-2.1.0-beta/logs/hadoop-hadoop-secondarynamenode-solaris.out
14/03/18 09:36:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to
/export/home/hadoop/hadoop-2.1.0-beta/logs/yarn-hadoop-resourcemanager-solaris.out
localhost: starting nodemanager, logging to
/export/home/hadoop/hadoop-2.1.0-beta/logs/yarn-hadoop-nodemanager-solaris.out
[hadoop@solaris hadoop-2.1.0-beta]$ md5sum ~/enwiki-latest-pages-articles.xml
1d174e6998028d291eb7e2974d44b368
/export/home/hadoop/enwiki-latest-pages-articles.xml
[hadoop@solaris hadoop-2.1.0-beta]$ cd ../mahout-distribution-0.9/
[hadoop@solaris mahout-distribution-0.9]$ which hadoop
/export/home/hadoop/hadoop-2.1.0-beta/bin/hadoop
[hadoop@solaris mahout-distribution-0.9]$ ./bin/mahout wikipediaXMLSplitter -d
../enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.1.0-beta/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/export/home/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
14/03/18 09:48:41 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found
on classpath, will use command-line arguments only
14/03/18 09:48:42 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter.main(WikipediaXmlSplitter.java:208)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
[hadoop@solaris mahout-distribution-0.9]$ cd ../hadoop-2.1.0-beta/etc/hadoop/
[hadoop@solaris hadoop]$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose scheme and
authority determine the FileSystem implementation. The uri's scheme determines
the config property (fs.SCHEME.impl) naming the FileSystem implementation
class. The uri's authority is used to determine the host, port, etc. for a
filesystem.</description>
</property>
</configuration>
[hadoop@solaris hadoop]$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication. The actual number of replications can
be specified when the file is created. The default is used if replication is
not specified in create time. </description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/home/hadoop/hadoop-2.1.0-beta/tmp </value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/export/home/hadoop/hadoop-2.1.0-beta/dfs.name </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/export/home/hadoop/hadoop-2.1.0-betav/dfs.data </value>
</property>
</configuration>
[hadoop@solaris hadoop]$ cat mapred-site.xml.template
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and reduce task.
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xm1024M</value>
</property>
</configuration>
[hadoop@solaris hadoop]$
> The wikipediaXMLSplitter example fails with "heap size" error
> -------------------------------------------------------------
>
> Key: MAHOUT-1456
> URL: https://issues.apache.org/jira/browse/MAHOUT-1456
> Project: Mahout
> Issue Type: Bug
> Components: Examples
> Affects Versions: 0.9
> Environment: Solaris 11.1 \
> Hadoop 2.3.0 \
> Maven 3.2.1 \
> JDK 1.7.0_07-b10 \
> Reporter: mahmood
> Labels: Heap,, mahout,, wikipediaXMLSplitter
>
> 1- The XML file is
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> 2- When I run "mahout wikipediaXMLSplitter -d
> enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at
> chunk #571 and after 30 minutes it fails to continue with the java heap size
> error. Previous chunks are created rapidly (10 chunks per second).
> 3- Increasing the heap size via "-Xmx4096m" option doesn't work.
> 4- No matter what is the configuration, it seems that there is a memory leak
> that eat all space.
--
This message was sent by Atlassian JIRA
(v6.2#6252)