[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-19 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-220446944
  
@jkbradley see the discussion/proposed solutions on 
https://github.com/apache/spark/pull/12914


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13148


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-19 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-220442030
  
LGTM
Merging with master and branch-2.0
Thanks!

@holdenk We can discuss this Pyspark Params issue more.  I agree it will be 
important to offer users an easy way to view the defaults in PySpark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-18 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13148#discussion_r63799888
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends 
Params
   with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
 
   /**
-   * Set the number of clusters to create (k). Must be > 1. Default: 2.
+   * The desired number of leaf clusters. Must be > 1. Default: 4.
+   * The actual number could be smaller if there are no divisible leaf 
clusters.
* @group param
*/
   @Since("2.0.0")
-  final val k = new IntParam(this, "k", "number of clusters to create", 
(x: Int) => x > 1)
+  final val k = new IntParam(this, "k", "The desired number of leaf 
clusters. " +
+"Must be > 1. Default: 4.", ParamValidators.gt(1))
--- End diff --

For what use cases?  I could imagine
```
binarizer = Binarizer()
binarizer.threshold  # prints Param fields, including built-in doc
```

Maybe instead we should add a method explain taking a string:
```
binarizer.explainParam("threshold")  # This currently requires a Param, not 
a String
```
or
```
binarizer.explain("threshold")  # This could be a shorter alias.
```

I'd prefer not to specify defaults in extra places since it is easy to type 
the wrong value or get them out of sync if defaults change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13148#discussion_r63789574
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends 
Params
   with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
 
   /**
-   * Set the number of clusters to create (k). Must be > 1. Default: 2.
+   * The desired number of leaf clusters. Must be > 1. Default: 4.
+   * The actual number could be smaller if there are no divisible leaf 
clusters.
* @group param
*/
   @Since("2.0.0")
-  final val k = new IntParam(this, "k", "number of clusters to create", 
(x: Int) => x > 1)
+  final val k = new IntParam(this, "k", "The desired number of leaf 
clusters. " +
+"Must be > 1. Default: 4.", ParamValidators.gt(1))
--- End diff --

cc @sethah / @MLnick who have also been thinking about default values in 
PyDocs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13148#discussion_r63756694
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends 
Params
   with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
 
   /**
-   * Set the number of clusters to create (k). Must be > 1. Default: 2.
+   * The desired number of leaf clusters. Must be > 1. Default: 4.
+   * The actual number could be smaller if there are no divisible leaf 
clusters.
* @group param
*/
   @Since("2.0.0")
-  final val k = new IntParam(this, "k", "number of clusters to create", 
(x: Int) => x > 1)
+  final val k = new IntParam(this, "k", "The desired number of leaf 
clusters. " +
+"Must be > 1. Default: 4.", ParamValidators.gt(1))
--- End diff --

Built in documentation is also displayed in PyDocs and having the default 
values there is incredibly useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219927964
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58743/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219927902
  
**[Test build #58743 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58743/consoleFull)**
 for PR 13148 at commit 
[`c0f3bf7`](https://github.com/apache/spark/commit/c0f3bf753930c04252885b1260449348fa2b9f61).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219927963
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219923723
  
**[Test build #58743 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58743/consoleFull)**
 for PR 13148 at commit 
[`c0f3bf7`](https://github.com/apache/spark/commit/c0f3bf753930c04252885b1260449348fa2b9f61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13148#discussion_r63600571
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -39,23 +39,27 @@ private[clustering] trait BisectingKMeansParams extends 
Params
   with HasMaxIter with HasFeaturesCol with HasSeed with HasPredictionCol {
 
   /**
-   * Set the number of clusters to create (k). Must be > 1. Default: 2.
+   * The desired number of leaf clusters. Must be > 1. Default: 4.
+   * The actual number could be smaller if there are no divisible leaf 
clusters.
* @group param
*/
   @Since("2.0.0")
-  final val k = new IntParam(this, "k", "number of clusters to create", 
(x: Int) => x > 1)
+  final val k = new IntParam(this, "k", "The desired number of leaf 
clusters. " +
+"Must be > 1. Default: 4.", ParamValidators.gt(1))
--- End diff --

No need to state default in built-in doc.  Scala doc is OK.  Built-in doc 
is only displayed by explainParams, which reads from the defaultParamMap 
already.  Same for the other param changes below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219848715
  
That's the only issue I saw.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219846049
  
I'll take a look now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219684646
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58684/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219684645
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219684430
  
**[Test build #58684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58684/consoleFull)**
 for PR 13148 at commit 
[`de584aa`](https://github.com/apache/spark/commit/de584aa001eac755493e48d1ed0f15c4f1913591).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13148#issuecomment-219675843
  
**[Test build #58684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58684/consoleFull)**
 for PR 13148 at commit 
[`de584aa`](https://github.com/apache/spark/commit/de584aa001eac755493e48d1ed0f15c4f1913591).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit...

2016-05-17 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/13148

[SPARK-15361] [ML] ML 2.0 QA: Scala APIs audit for clustering

## What changes were proposed in this pull request?
Audit Scala API for ml.clustering.
Fix some wrong API documentations and update outdated one.

## How was this patch tested?
Existing unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-15361

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13148


commit de584aa001eac755493e48d1ed0f15c4f1913591
Author: Yanbo Liang 
Date:   2016-05-17T10:03:51Z

ML 2.0 QA: Scala APIs audit for clustering




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org