[
https://issues.apache.org/jira/browse/MAHOUT-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Palumbo updated MAHOUT-1440:
-----------------------------------
Description:
It was noted recently that there should be a way to set a static seed for the
the initial clusters of Kmeans. In the interests of reproducibility and
benchmarking, this patch adds an option to set the seed in the RNG used in the
RandomSeedGenerator.buildRandom() method called from the KmeansDriver and
FuzzyKMeansDriver.
I've added in a CLI option -setRandomSeed that when set to the same value (with
the -k option set) will produce reproducible results from kmeans and fkmeans.
This patch allows the user to set a value. It may make more sense to just have
an option to set a flag to use the STANDARD_SEED from RandomWrapper.
I am still feeling my way around the codebase so if this will be useful and
there need to be any changes let me know.
was:
It was noted recently that there should be a way to set a static seed for the
the initial clusters of Kmeans. In the interests of reproducibility and
benchmarking, this patch adds an option to set the seed in the RNG used in the
RandomSeedGenerator.buildRandom() method called from the KmeansDriver and
FuzzyKMeansDriver.
I've added in a CLI option -setRandomSeed that when set to the same value will
produce reproducible results from kmeans and fkmeans.
This patch allows the user to set a value. It may make more sense to just have
an option to set a flag to use the STANDARD_SEED from RandomWrapper.
I am still feeling my way around the codebase so if this will be useful and
there need to be any changes let me know.
> Add option to set the RNG seed for inital cluster generation in Kmeans/fKmeans
> ------------------------------------------------------------------------------
>
> Key: MAHOUT-1440
> URL: https://issues.apache.org/jira/browse/MAHOUT-1440
> Project: Mahout
> Issue Type: Improvement
> Components: CLI, Clustering
> Affects Versions: 1.0
> Reporter: Andrew Palumbo
> Priority: Minor
> Labels: reproducibility
> Fix For: 1.0
>
> Attachments: MAHOUT-1440.patch
>
>
> It was noted recently that there should be a way to set a static seed for the
> the initial clusters of Kmeans. In the interests of reproducibility and
> benchmarking, this patch adds an option to set the seed in the RNG used in
> the RandomSeedGenerator.buildRandom() method called from the KmeansDriver and
> FuzzyKMeansDriver.
> I've added in a CLI option -setRandomSeed that when set to the same value
> (with the -k option set) will produce reproducible results from kmeans and
> fkmeans.
> This patch allows the user to set a value. It may make more sense to just
> have an option to set a flag to use the STANDARD_SEED from RandomWrapper.
> I am still feeling my way around the codebase so if this will be useful and
> there need to be any changes let me know.
--
This message was sent by Atlassian JIRA
(v6.2#6252)