[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19676 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19676#discussion_r155929522 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -51,9 +52,14 @@ public static void main(String[] args) { KMeans kmeans = new KMeans().setK(2).setSeed(1L); KMeansModel model = kmeans.fit(dataset); -// Evaluate clustering by computing Within Set Sum of Squared Errors. -double WSSSE = model.computeCost(dataset); -System.out.println("Within Set Sum of Squared Errors = " + WSSSE); +// Make predictions +Dataset predictions = model.transform(dataset); + +// Evaluate clustering by computing Silhouette score +ClusteringEvaluator evaluator = new ClusteringEvaluator(); + +double silhouette = evaluator.evaluate(predictions); +System.out.println("Silhouette with squared euclidean distance = " + silhouette); --- End diff -- Thanks, I don't think I am changing the code again, but I can fix this grammatical error if you want. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19676#discussion_r155928871 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -51,9 +52,14 @@ public static void main(String[] args) { KMeans kmeans = new KMeans().setK(2).setSeed(1L); KMeansModel model = kmeans.fit(dataset); -// Evaluate clustering by computing Within Set Sum of Squared Errors. -double WSSSE = model.computeCost(dataset); -System.out.println("Within Set Sum of Squared Errors = " + WSSSE); +// Make predictions +Dataset predictions = model.transform(dataset); + +// Evaluate clustering by computing Silhouette score +ClusteringEvaluator evaluator = new ClusteringEvaluator(); + +double silhouette = evaluator.evaluate(predictions); +System.out.println("Silhouette with squared euclidean distance = " + silhouette); --- End diff -- euclidean -> Euclidean, but not important to change unless you're touching the code again anyway --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/19676#discussion_r155913190 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java --- @@ -51,9 +52,17 @@ public static void main(String[] args) { KMeans kmeans = new KMeans().setK(2).setSeed(1L); KMeansModel model = kmeans.fit(dataset); -// Evaluate clustering by computing Within Set Sum of Squared Errors. -double WSSSE = model.computeCost(dataset); -System.out.println("Within Set Sum of Squared Errors = " + WSSSE); +// Make predictions +Dataset predictions = model.transform(dataset); + +// Evaluate clustering by computing Silhouette score +ClusteringEvaluator evaluator = new ClusteringEvaluator() + .setFeaturesCol("features") + .setPredictionCol("prediction") --- End diff -- We use default values here, so it's not necessary to set them explicitly. We should keep examples as simple as possible. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19676 [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to examples ## What changes were proposed in this pull request? In SPARK-14516 we have introduced ClusteringEvaluator, but we didn't put any reference in the documentation and the examples were still relying on the sum of squared errors to show a way to evaluate the clustering model. The PR adds the ClusteringEvaluator in the examples. ## How was this patch tested? Manual runs of the examples. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-14516_examples Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19676 commit 4c4f83e97d9bd2d8771452498581bf9ce43bd28d Author: Marco GaidoDate: 2017-11-06T15:49:17Z [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to examples --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org