Hi,
I have started 2 PRs to solve the problem you metioned.
About the "CentroidInitializer" I have a new idea:
Move CentroidInitializers as inner classes of "KMeansPlusPlusCluster",
and add a construct parameter and a property "useKMeansPlusPlus" to
"KMeansPlusPlusCluster":
```java
// Add "useKMeansPlusPlus" to "KMeansPlusPlusClusterer"
public class KMeansPlusPlusClusterer<T extends Clusterable> extends
Clusterer<T> {
public KMeansPlusPlusClusterer(final int k, final int maxIterations,
final DistanceMeasure measure,
final UniformRandomProvider random,
final EmptyClusterStrategy emptyStrategy,
+ final useKMeansPlusPlus) {
// ...
- // Use K-means++ to choose the initial centers.
- this.centroidInitializer = new KMeansPlusPlusCentroidInitializer(measure,
random);
+ this.useKMeansPlusPlus = useKMeansPlusPlus;
}
public boolean isUseKMeansPlusPlus() {return this.useKMeansPlusPlus;}
// Make "chooseInitialCenters" package-private and call
"CentroidInitializer.selectCentroids"
// Then the chooseInitialCenters can be reused by "MiniBatchKMeans".
List<CentroidCluster<T>> chooseInitialCenters(final Collection<T> points){
// Use K-means++ to choose the initial centers.
final CentroidInitializer centroidInitializer = useKMeansPlusPlus?
new KMeansPlusPlusCentroidInitializer(this.measure,
this.random)
:new RandomCentroidInitializer(this.random);
return centroidInitializer.selectCentroids(points, this.k);
}
// Make CentroidInitializer private
private static interface CentroidInitializer {
<T extends Clusterable> List<CentroidCluster<T>> selectCentroids(final
Collection<T> points, final int k);
}
private static class RandomCentroidInitializer implements CentroidInitializer
{...}
private static class KMeansPlusPlusCentroidInitializer implements
CentroidInitializer {...}
```
The "CentroidInitializer" only used in "KMeansPlusPlusClusterer" and
"MiniBatchKMeans",
the other k-means based algorithm use "KMeansPlusPlusClusterer" as a parameter.
```java
// Changes in "MiniBatchKMeansClusterer"
public class MiniBatchKMeansClusterer<T extends Clusterable>
public MiniBatchKMeansClusterer(final int k,
final int maxIterations,
final int batchSize,
final int initIterations,
final int initBatchSize,
final int maxNoImprovementTimes,
final DistanceMeasure measure,
final UniformRandomProvider random,
final EmptyClusterStrategy emptyStrategy,
+ final useKMeansPlusPlus) {
- super(k, maxIterations, measure, random, emptyStrategy);
+ super(k, maxIterations, measure, random, emptyStrategy,
useKMeansPlusPlus);
//...
}
//...
private List<CentroidCluster<T>> initialCenters(final List<T> points) {
//...
- final List<CentroidCluster<T>> clusters =
getCentroidInitializer().selectCentroids(initialPoints, getK());
+ final List<CentroidCluster<T>> clusters =
chooseInitialCenters(initialPoints);
//...
}
}
```
>Hi Tao.
>
>I've merged PR #128 but please see my comment on the JIRA page.[1]
>
>Thanks for your interest in improving the library,
>Gilles
>
>[1]
>https://issues.apache.org/jira/browse/MATH-1509?focusedCommentId=17064306&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17064306
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]