[jira] [Commented] (MAHOUT-1328) CLI-invoked K-means final step (Cluster Classification Driver) ignores job-specific -D MR parameters

Suneel Marthi (JIRA) Mon, 02 Dec 2013 19:37:49 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837289#comment-13837289
 ]


Suneel Marthi commented on MAHOUT-1328:
---------------------------------------

[~stewh-uk]  Ran this on my local hadoop cluster with -D CLI params you had 
provided, I am seeing that the final cluster classification task does indeed 
execute with the provided -D CLI params. Are u seeing this issue with Mahout 
0.8, this issue has been fixed in 0.8 with MAHOUT-1201 (Some Mahout jobs do not 
pass user supplied Configuration object to sub jobs) ?




> CLI-invoked K-means final step (Cluster Classification Driver) ignores 
> job-specific -D MR parameters
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1328
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1328
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.8
>            Reporter: Stewart Whiting
>            Assignee: Suneel Marthi
>             Fix For: 0.9
>
>
> I believe this is an issue - someone please correct me if not!
> I am running a large k-means clustering task. Our default cluster map/reduce 
> slots per node and JVM memory parameters etc are not appropriate for the 
> memory requirements of this.
> So, I invoke K-means clustering from the CLI using, for example:
> mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm 
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 12 -ow -k 50 -cl 
> -Dmapred.child.java.opts=-Xmx7096m 
> -Dmapred.tasktracker.reduce.tasks.maximum=1 
> -Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000 
> -Dmapred.cluster.max.map.memory.mb=7000 
> -Dmapred.cluster.reduce.memory.mb=7000 
> -Dmapred.cluster.max.reduce.memory.mb=7000
> The initial MR tasks for each clustering iteration run successfully. 
> Inspecting the Hadoop config for each task after completion show that the job 
> runs with the explicitly provided MR configuration from the -D parameters.
> However, when the final cluster classification task is run (i.e. to generate 
> the clusteredPoints/ directory), it usually fails due to outOfMemory errors. 
> Inspecting the MR task logs for it shows that it ran with the default cluster 
> settings, not those provided by my -D CLI parameters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAHOUT-1328) CLI-invoked K-means final step (Cluster Classification Driver) ignores job-specific -D MR parameters

Reply via email to