[ https://issues.apache.org/jira/browse/SPARK-22440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238241#comment-16238241 ]
Marco Gaido commented on SPARK-22440: ------------------------------------- Honestly I don't know what people are using for clustering evaluation and I don't know either where to retrive such a statistic. My goal here was to make easier for people to migrate their existing workloads to Spark. Since sklearn is surely one of the most widespread libraries for machine learning, the existing workloads can evaluate an unsupervised clustering through Silhouette or Calinski-Harabasz. If we support both, I think the adoption of Spark would be easier for them. > Add Calinski-Harabasz index to ClusteringEvaluator > -------------------------------------------------- > > Key: SPARK-22440 > URL: https://issues.apache.org/jira/browse/SPARK-22440 > Project: Spark > Issue Type: New Feature > Components: ML > Affects Versions: 2.3.0 > Reporter: Marco Gaido > Priority: Minor > > In SPARK-14516 we introduced ClusteringEvaluator with an implementation of > Silhouette. > sklearn contains also another metric for the evaluation of unsupervised > clustering results. The metric is Calinski-Harabasz. This JIRA is to add it > to Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org