[ 
https://issues.apache.org/jira/browse/MAHOUT-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973994#comment-13973994
 ] 

Hudson commented on MAHOUT-1440:
--------------------------------

SUCCESS: Integrated in Mahout-Quality #2576 (See 
[https://builds.apache.org/job/Mahout-Quality/2576/])
MAHOUT-1440 Add option to set the RNG seed for inital cluster generation in 
Kmeans/fKmeans (ssc: rev 1588439)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/fuzzykmeans/FuzzyKMeansDriver.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/RandomSeedGenerator.java
* 
/mahout/trunk/core/src/main/java/org/apache/mahout/common/commandline/DefaultOptionCreator.java
* 
/mahout/trunk/core/src/test/java/org/apache/mahout/clustering/kmeans/TestRandomSeedGenerator.java
* /mahout/trunk/src/conf/fkmeans.props
* /mahout/trunk/src/conf/kmeans.props


> Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans
> ------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1440
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1440
>             Project: Mahout
>          Issue Type: Improvement
>          Components: CLI, Clustering
>    Affects Versions: 1.0
>            Reporter: Andrew Palumbo
>            Assignee: Sebastian Schelter
>            Priority: Minor
>              Labels: reproducibility
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1440.patch
>
>
> It was noted recently that there should be a way to set a static seed for the 
> the initial clusters of Kmeans. In the interests of reproducibility and 
> benchmarking, this patch adds an option to set the seed in the RNG used in 
> the RandomSeedGenerator.buildRandom() method called from the KmeansDriver and 
> FuzzyKMeansDriver.  
> I've added in a CLI option -setRandomSeed that when set to the same value 
> (with the -k option set) will produce reproducible results from kmeans and 
> fkmeans.
> This patch allows the user to set a value.  It may make more sense to just 
> have an option to set a flag to use the STANDARD_SEED from RandomWrapper.
> I am still feeling my way around the codebase so if this will be useful and 
> there need to be any changes let me know.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to