Hi,
>Hello.
>
>Le mar. 24 mars 2020 à 06:39, [email protected] <[email protected]> a écrit :
>>
>> Hi,
>>
>> I have started 2 PRs to solve the problem you metioned.
>>
>> About the "CentroidInitializer" I have a new idea:
>> Move CentroidInitializers as inner classes of "KMeansPlusPlusCluster",
>> and add a construct parameter and a property "useKMeansPlusPlus" to
>> "KMeansPlusPlusCluster":
>> ```java
>> // Add "useKMeansPlusPlus" to "KMeansPlusPlusClusterer"
>> public class KMeansPlusPlusClusterer<T extends Clusterable> extends
>> Clusterer<T> {
>> public KMeansPlusPlusClusterer(final int k, final int maxIterations,
>> final DistanceMeasure measure,
>> final UniformRandomProvider random,
>> final EmptyClusterStrategy emptyStrategy,
>> + final useKMeansPlusPlus) {
>> // ...
>> - // Use K-means++ to choose the initial centers.
>> - this.centroidInitializer = new KMeansPlusPlusCentroidInitializer(measure,
>> random);
>> + this.useKMeansPlusPlus = useKMeansPlusPlus;
>> }
>
>What if one comes up with a third way to initialize the centroids?
>If you can ensure that there is no other initialization procedure,
>a boolean is fine, if not, we could still make the existing procedures
>package-private (e.g. moving them in as classes defined within
>"KMeansPlusPlusClusterer".
As I know the k-means has two center initialize methods, random and k-means++
so far,
use a boolean to choose which method to use is good enough for current use,
but there are two situations use need to implement the center initialize method
themselves:
1. The Commoans Maths's implements is not good enough;
2. There are new center initialize methods.
>
>Also, in the current implementation of "KMeansPlusPlusClusterer", the
>initialization is not configurable ("KMeansPlusPlusCentroidInitializer").
>Perhaps we don't want to depart from the original (?) algorithm; if so,
>the new constructor could be made protected (thus simplifying the API).
k-means++ is the recommend center initialize method for now days,
show we let user to fall back to random choose centers, that is a question need
to tradeoff.
Show we make the API simple or rich?
>
>> public boolean isUseKMeansPlusPlus() {return this.useKMeansPlusPlus;}
>
>Why should this method be defined?
To let user get their cluster parameters, same as "getEmptyStrategy()"
>
>Regards,
>Gilles
>
>> [...]
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [email protected]
>For additional commands, e-mail: [email protected]
>
>