Hi,

>Hello.
>
>2020-03-06 9:48 UTC+01:00, chentao...@qq.com <chentao...@qq.com>:
>> Hi,
>>     For machine learning centroid cluster algorithm, we often use is
>> Calinsk-iHarabasz score to evaluate which algorithm or how many centers is
>> best for a dataset.
>>     The python lib sklearn implements Calinsk-iHarabasz as
>> sklearn.metrics.calinski_harabasz_score.
>
>Could you post a reference (most of our documentation points
>to "Wikipedia" or "MathWorld")? 

"Calinsk-iHarabasz" is the most popular evaluator for Centriod Clusters as I 
know.
I just read the code of sklearn, and think it easy to implement.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html
https://www.tandfonline.com/doi/abs/10.1080/03610927408827101

>
>> I think there should be a CalinskiHarabaszClusterEvaluator in commons math:
>
>At first sight, the approach would be to define a functional
>interface (with the "score" method).
>Then an "enum" that would be a factory of evaluators, along
>the lines of what has been done in "Commons RNG" (see class
>"RandomSource"[1]). 

I just inherit the design of "ClusterEvaluator",
and I think change the design of exists API is another question.

>
>> ```java
>> package org.apache.commons.math4.ml.clustering.evaluation;
>>
>> import org.apache.commons.math4.ml.clustering.Cluster;
>> import org.apache.commons.math4.ml.clustering.Clusterable;
>>
>> import java.util.List;
>>
>> public class CalinskiHarabaszClusterEvaluator<T extends Clusterable> extends
>> ClusterEvaluator<T> {
>>     @Override
>>     public double score(List<? extends Cluster<T>> clusters) {
>>         //TODO: Implement the Calinski-Harabasz Score algorithm
>>         return 0;
>>     }
>>
>>     @Override
>>     public boolean isBetterScore(double score1, double score2) {
>>         return score1 > score2;
>>     }
>
>This method does not seem very useful.
>
>> }
>> ```
>>
>> The code can be implemented by read the algorithm documents,
>> or translate from python sklearn.metrics.calinski_harabasz_score.
>
>What's the license of that code? 

The sklearn is under the BSD license.
I think math ml reference the sklearn so much, 
for example: org.apache.commons.math4.userguide.ClusterAlgorithmComparison

>
>Regards,
>Gilles
>
>[1] 
>https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to