[
https://issues.apache.org/jira/browse/MAHOUT-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi reassigned MAHOUT-1328:
-------------------------------------
Assignee: Suneel Marthi
> CLI-invoked K-means final step (Cluster Classification Driver) ignores
> job-specific -D MR parameters
> ----------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1328
> URL: https://issues.apache.org/jira/browse/MAHOUT-1328
> Project: Mahout
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Stewart Whiting
> Assignee: Suneel Marthi
>
> I believe this is an issue - someone please correct me if not!
> I am running a large k-means clustering task. Our default cluster map/reduce
> slots per node and JVM memory parameters etc are not appropriate for the
> memory requirements of this.
> So, I invoke K-means clustering from the CLI using, for example:
> mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 12 -ow -k 50 -cl
> -Dmapred.child.java.opts=-Xmx7096m
> -Dmapred.tasktracker.reduce.tasks.maximum=1
> -Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000
> -Dmapred.cluster.max.map.memory.mb=7000
> -Dmapred.cluster.reduce.memory.mb=7000
> -Dmapred.cluster.max.reduce.memory.mb=7000
> The initial MR tasks for each clustering iteration run successfully.
> Inspecting the Hadoop config for each task after completion show that the job
> runs with the explicitly provided MR configuration from the -D parameters.
> However, when the final cluster classification task is run (i.e. to generate
> the clusteredPoints/ directory), it usually fails due to outOfMemory errors.
> Inspecting the MR task logs for it shows that it ran with the default cluster
> settings, not those provided by my -D CLI parameters.
--
This message was sent by Atlassian JIRA
(v6.1#6144)