[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16654 gentle ping @zhengruifeng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-26 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16654 @zhengruifeng don't most ML libraries have separate clustering evaluators? For example, WEKA has ClusterEvalution class. Scikit-learn just has a metrics class and functions you can call,

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-24 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 @srowen I agree that metric should be irrelevant to details of the algorithms. AUC is irrelevant to algorithms, it is just relevant to the dataset: In spark-ml, scikit-learn, or any other

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-24 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 Sure, and classification metrics like AUC only make sense for classifiers that output more than just a label -- they have to output a probability or score of some kind. Not every metric necessarily

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-23 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Existing metrics (WSSSE,Loglikelihood) are relevant to detail of algorithm. Computation of WSSSE for KMeans/BisectKMeans use the average vectors as the centers, but for KMedoids the medoids,

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-23 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16654 Also, if some metrics are only applicable to some models, as srowen noted, we can either make separate evaluator classes or put all metrics on one but throw if the model does not support that

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-23 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16654 Wouldn't we eventually want to add a lot more clustering metrics like Dunn, Davies-Bouldin, Simplified Silhouette etc... there are a lot of clustering metrics and it seems like a good idea to

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 Metrics evaluate the clustering though; the details of the algorithm are irrelevant. This still clusters points in a continuous space so you can measure WSSSE. --- If your project is set up for

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 @srowen The concept of `center` don't exist in DBSCAN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71805/ Test FAILed. ---

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71805/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 I agree that clustering metrics are different from classification metrics, but that doesn't mean they can't have some common abstraction -- they're applied to a model and data set and produce a

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 @srowen I think I had not clarify my thoughts. WSSSE and Loglikelihood are algorithm-specific metrics. For example: WSSSE dont make sense for clustering algorithms like DBSCAN,

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 Yes, I think this is at best a duplicate of SPARK-14516. You don't want to add ad-hoc methods for this. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71805/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71799/ Test FAILed. ---

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71799 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71799/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-21 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 I think now clustering metrics are not that general, comparing with classification/regression metrics: WSSSE only apply to `KMeans` and `BiKMeans` Loglikelihood only apply to `GMM`

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16654 +1 with @srowen , this should be limited to the evaluator/metrics classes. If we have an evaluator for clustering then will we be able to use it with hyperparameter tuner (cross validate)?

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71717/ Test FAILed. ---

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71717/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16654 General question: isn't this what Evaluators are for? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71717/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71710/ Test FAILed. ---

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71710/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71710/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/16654 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71707/ Test FAILed. ---

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71707/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71698/ Test FAILed. ---

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71698/testReport)** for PR 16654 at commit

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16654 **[Test build #71698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71698/testReport)** for PR 16654 at commit