[
https://issues.apache.org/jira/browse/MAHOUT-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837289#comment-13837289
]
Suneel Marthi commented on MAHOUT-1328:
---------------------------------------
[~stewh-uk] Ran this on my local hadoop cluster with -D CLI params you had
provided, I am seeing that the final cluster classification task does indeed
execute with the provided -D CLI params. Are u seeing this issue with Mahout
0.8, this issue has been fixed in 0.8 with MAHOUT-1201 (Some Mahout jobs do not
pass user supplied Configuration object to sub jobs) ?
> CLI-invoked K-means final step (Cluster Classification Driver) ignores
> job-specific -D MR parameters
> ----------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1328
> URL: https://issues.apache.org/jira/browse/MAHOUT-1328
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.8
> Reporter: Stewart Whiting
> Assignee: Suneel Marthi
> Fix For: 0.9
>
>
> I believe this is an issue - someone please correct me if not!
> I am running a large k-means clustering task. Our default cluster map/reduce
> slots per node and JVM memory parameters etc are not appropriate for the
> memory requirements of this.
> So, I invoke K-means clustering from the CLI using, for example:
> mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 12 -ow -k 50 -cl
> -Dmapred.child.java.opts=-Xmx7096m
> -Dmapred.tasktracker.reduce.tasks.maximum=1
> -Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000
> -Dmapred.cluster.max.map.memory.mb=7000
> -Dmapred.cluster.reduce.memory.mb=7000
> -Dmapred.cluster.max.reduce.memory.mb=7000
> The initial MR tasks for each clustering iteration run successfully.
> Inspecting the Hadoop config for each task after completion show that the job
> runs with the explicitly provided MR configuration from the -D parameters.
> However, when the final cluster classification task is run (i.e. to generate
> the clusteredPoints/ directory), it usually fails due to outOfMemory errors.
> Inspecting the MR task logs for it shows that it ran with the default cluster
> settings, not those provided by my -D CLI parameters.
--
This message was sent by Atlassian JIRA
(v6.1#6144)