[ 
https://issues.apache.org/jira/browse/MAHOUT-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1328.
-----------------------------------

       Resolution: Not A Problem
    Fix Version/s:     (was: 0.9)
                   0.8

This has been fixed by Mahout-1201 for 0.8.

> CLI-invoked K-means final step (Cluster Classification Driver) ignores 
> job-specific -D MR parameters
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1328
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1328
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.8
>            Reporter: Stewart Whiting
>            Assignee: Suneel Marthi
>             Fix For: 0.8
>
>
> I believe this is an issue - someone please correct me if not!
> I am running a large k-means clustering task. Our default cluster map/reduce 
> slots per node and JVM memory parameters etc are not appropriate for the 
> memory requirements of this.
> So, I invoke K-means clustering from the CLI using, for example:
> mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm 
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 12 -ow -k 50 -cl 
> -Dmapred.child.java.opts=-Xmx7096m 
> -Dmapred.tasktracker.reduce.tasks.maximum=1 
> -Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000 
> -Dmapred.cluster.max.map.memory.mb=7000 
> -Dmapred.cluster.reduce.memory.mb=7000 
> -Dmapred.cluster.max.reduce.memory.mb=7000
> The initial MR tasks for each clustering iteration run successfully. 
> Inspecting the Hadoop config for each task after completion show that the job 
> runs with the explicitly provided MR configuration from the -D parameters.
> However, when the final cluster classification task is run (i.e. to generate 
> the clusteredPoints/ directory), it usually fails due to outOfMemory errors. 
> Inspecting the MR task logs for it shows that it ran with the default cluster 
> settings, not those provided by my -D CLI parameters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to