[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-12-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19676


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-12-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19676#discussion_r155929522
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,14 @@ public static void main(String[] args) {
 KMeans kmeans = new KMeans().setK(2).setSeed(1L);
 KMeansModel model = kmeans.fit(dataset);
 
-// Evaluate clustering by computing Within Set Sum of Squared Errors.
-double WSSSE = model.computeCost(dataset);
-System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+// Make predictions
+Dataset predictions = model.transform(dataset);
+
+// Evaluate clustering by computing Silhouette score
+ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+double silhouette = evaluator.evaluate(predictions);
+System.out.println("Silhouette with squared euclidean distance = " + 
silhouette);
--- End diff --

Thanks, I don't think I am changing the code again, but I can fix this 
grammatical error if you want.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-12-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19676#discussion_r155928871
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,14 @@ public static void main(String[] args) {
 KMeans kmeans = new KMeans().setK(2).setSeed(1L);
 KMeansModel model = kmeans.fit(dataset);
 
-// Evaluate clustering by computing Within Set Sum of Squared Errors.
-double WSSSE = model.computeCost(dataset);
-System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+// Make predictions
+Dataset predictions = model.transform(dataset);
+
+// Evaluate clustering by computing Silhouette score
+ClusteringEvaluator evaluator = new ClusteringEvaluator();
+
+double silhouette = evaluator.evaluate(predictions);
+System.out.println("Silhouette with squared euclidean distance = " + 
silhouette);
--- End diff --

euclidean -> Euclidean, but not important to change unless you're touching 
the code again anyway


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-12-08 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19676#discussion_r155913190
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaKMeansExample.java ---
@@ -51,9 +52,17 @@ public static void main(String[] args) {
 KMeans kmeans = new KMeans().setK(2).setSeed(1L);
 KMeansModel model = kmeans.fit(dataset);
 
-// Evaluate clustering by computing Within Set Sum of Squared Errors.
-double WSSSE = model.computeCost(dataset);
-System.out.println("Within Set Sum of Squared Errors = " + WSSSE);
+// Make predictions
+Dataset predictions = model.transform(dataset);
+
+// Evaluate clustering by computing Silhouette score
+ClusteringEvaluator evaluator = new ClusteringEvaluator()
+  .setFeaturesCol("features")
+  .setPredictionCol("prediction")
--- End diff --

We use default values here, so it's not necessary to set them explicitly. 
We should keep examples as simple as possible. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-11-06 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19676

[SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to examples

## What changes were proposed in this pull request?

In SPARK-14516 we have introduced ClusteringEvaluator, but we didn't put 
any reference in the documentation and the examples were still relying on the 
sum of squared errors to show a way to evaluate the clustering model.

The PR adds the ClusteringEvaluator in the examples.

## How was this patch tested?

Manual runs of the examples.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-14516_examples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19676


commit 4c4f83e97d9bd2d8771452498581bf9ce43bd28d
Author: Marco Gaido 
Date:   2017-11-06T15:49:17Z

[SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to examples




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org