[ 
https://issues.apache.org/jira/browse/SPARK-22440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238241#comment-16238241
 ] 

Marco Gaido commented on SPARK-22440:
-------------------------------------

Honestly I don't know what people are using for clustering evaluation and I 
don't know either where to retrive such a statistic. My goal here was to make 
easier for people to migrate their existing workloads to Spark. Since sklearn 
is surely one of the most widespread libraries for machine learning, the 
existing workloads can evaluate an unsupervised clustering through Silhouette 
or Calinski-Harabasz. If we support both, I think the adoption of Spark would 
be easier for them.

> Add Calinski-Harabasz index to ClusteringEvaluator
> --------------------------------------------------
>
>                 Key: SPARK-22440
>                 URL: https://issues.apache.org/jira/browse/SPARK-22440
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Marco Gaido
>            Priority: Minor
>
> In SPARK-14516 we introduced ClusteringEvaluator with an implementation of 
> Silhouette.
> sklearn contains also another metric for the evaluation of unsupervised 
> clustering results. The metric is Calinski-Harabasz. This JIRA is to add it 
> to Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to