[ 
https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939518#comment-13939518
 ] 

mahmood commented on MAHOUT-1456:
---------------------------------

Generally, I agree that there might be some bug in the data. But note that the 
same data file has been test with the older Hadoop and it works. Lets say that 
D works with H1 and at the same time doesn't work with H2. Then there might be 
some differences between H1 and H2 that causes one of them to fail. So at this 
point, stating that D has a bug is weak.

> The wikipediaXMLSplitter example fails with "heap size" error
> -------------------------------------------------------------
>
>                 Key: MAHOUT-1456
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1456
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.9
>         Environment: Solaris 11.1 \
> Hadoop 2.3.0 \
> Maven 3.2.1 \
> JDK 1.7.0_07-b10 \
>            Reporter: mahmood
>              Labels: Heap,, mahout,, wikipediaXMLSplitter
>
> 1- The XML file is 
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> 2- When I run "mahout wikipediaXMLSplitter -d 
> enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at 
> chunk #571 and after 30 minutes it fails to continue with the java heap size 
> error. Previous chunks are created rapidly (10 chunks per second).
> 3- Increasing the heap size via "-Xmx4096m" option doesn't work.
> 4- No matter what is the configuration, it seems that there is a memory leak 
> that eat all space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to