[ 
https://issues.apache.org/jira/browse/SPARK-34664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18072802#comment-18072802
 ] 

Tống Thanh Phúc commented on SPARK-34664:
-----------------------------------------

Is this still open? I'm a student and want to help with this parity issue.

> Provide silhouette score for each sample when using ClusteringEvaluator
> -----------------------------------------------------------------------
>
>                 Key: SPARK-34664
>                 URL: https://issues.apache.org/jira/browse/SPARK-34664
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 3.1.1
>            Reporter: Julian Jorczik
>            Priority: Minor
>
> Computing the average silhouette score is already implemented when using 
> ClusteringEvaluator. When looking at the [source 
> code|https://gitlab.com/mark91/SparkClusteringEvaluationMetrics/-/blob/master/src/main/scala/org/apache/spark/ml/evaluation/SquaredEuclideanSilhouetteEvaluator.scala]
>  of ClusteringEvaluator, I think it would be easy to provide not only the 
> average silhouette score but also the silhouette score for each sample, as 
> they are already computed (Line 95-99).
>  The silhouette score for each sample can be helpful to generate a silhouette 
> plot for instance as described in [this scikit-learn 
> article|https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html].
>  The resulting feature would be equivalent to the silhouette_samples function 
> implemented in scikit-learn.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to