Andrew Palumbo created MAHOUT-1440:
--------------------------------------

             Summary: Add option to set the RNG seed for inital cluster 
generation in Kmeans/fKmeans
                 Key: MAHOUT-1440
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1440
             Project: Mahout
          Issue Type: Improvement
          Components: CLI, Clustering
    Affects Versions: 1.0
            Reporter: Andrew Palumbo
            Priority: Minor
             Fix For: 1.0


It was noted recently that there should be a way to set a static seed for the 
the initial clusters of Kmeans. In the interests of reproducibility and 
benchmarking, this patch adds an option to set the seed in the RNG used in the 
RandomSeedGenerator.buildRandom() method called from the KmeansDriver and 
FuzzyKMeansDriver.  

I've added in a CLI option -setRandomSeed that when set to the same value will 
produce reproducible results from kmeans and fkmeans.

This patch allows the user to set a value.  It may make more sense to just have 
an option to set a flag to use the STANDARD_SEED from RandomWrapper.

I am still feeling my way around the codebase so if this will be useful and 
there need to be any changes let me know.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to