[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-175381442
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50158/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-175381440
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-175380873
  
**[Test build #50158 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50158/consoleFull)**
 for PR 10788 at commit 
[`8016ad8`](https://github.com/apache/spark/commit/8016ad814a359e2e8d300c84b52a1a021f13b9dc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50931610
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -343,22 +355,36 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size == numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
   /*
  For binary logistic regression, when we initialize the 
coefficients as zeros,
  it will converge faster if we initialize the intercept such 
that
  it follows the distribution of the labels.
 
  {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
+ P(0) = 1 / (1 + \exp(b)), and
+ P(1) = \exp(b) / (1 + \exp(b))
  }}}, hence
  {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
+ b = \log{P(1) / P(0)} = \log{count_1 / count_0}
--- End diff --

put two spaces back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50931631
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -343,22 +355,36 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size == numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
   /*
  For binary logistic regression, when we initialize the 
coefficients as zeros,
  it will converge faster if we initialize the intercept such 
that
  it follows the distribution of the labels.
 
  {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
+ P(0) = 1 / (1 + \exp(b)), and
+ P(1) = \exp(b) / (1 + \exp(b))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50931598
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -343,22 +355,36 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size == numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
   /*
  For binary logistic regression, when we initialize the 
coefficients as zeros,
  it will converge faster if we initialize the intercept such 
that
  it follows the distribution of the labels.
 
  {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
+ P(0) = 1 / (1 + \exp(b)), and
+ P(1) = \exp(b) / (1 + \exp(b))
  }}}, hence
  {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
+ b = \log{P(1) / P(0)} = \log{count_1 / count_0}
  }}}
*/
-  initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(
-histogram(1) / histogram(0))
+  initialCoefficientsWithIntercept.toArray(numFeatures)
+= math.log(histogram(1) / histogram(0))
--- End diff --

revert this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50931620
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -343,22 +355,36 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size == numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
   /*
  For binary logistic regression, when we initialize the 
coefficients as zeros,
  it will converge faster if we initialize the intercept such 
that
  it follows the distribution of the labels.
 
  {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
+ P(0) = 1 / (1 + \exp(b)), and
--- End diff --

put two spaces back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-175341287
  
**[Test build #50158 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50158/consoleFull)**
 for PR 10788 at commit 
[`8016ad8`](https://github.com/apache/spark/commit/8016ad814a359e2e8d300c84b52a1a021f13b9dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-175341628
  
Thanks. Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10788


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174662125
  
I'm going through the caching logic now. Will let you know soon. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174682293
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174665118
  
**[Test build #50011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50011/consoleFull)**
 for PR 10788 at commit 
[`e6b797a`](https://github.com/apache/spark/commit/e6b797a51696238c3b7b369c77be9763e7d70b52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174682157
  
**[Test build #50011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50011/consoleFull)**
 for PR 10788 at commit 
[`e6b797a`](https://github.com/apache/spark/commit/e6b797a51696238c3b7b369c77be9763e7d70b52).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174682295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50011/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174661379
  
@dbtsai should have addressed the style concerns, let me know if anything 
else shows up :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174714408
  
**[Test build #50024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50024/consoleFull)**
 for PR 10788 at commit 
[`e6b797a`](https://github.com/apache/spark/commit/e6b797a51696238c3b7b369c77be9763e7d70b52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174715017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50024/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174688059
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174690199
  
**[Test build #50024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50024/consoleFull)**
 for PR 10788 at commit 
[`e6b797a`](https://github.com/apache/spark/commit/e6b797a51696238c3b7b369c77be9763e7d70b52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-174715010
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369722
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
+   * {{{
+   * P(0) = 1 / (1 + \exp(b)), and
+   * P(1) = \exp(b) / (1 + \exp(b))
+   * }}}, hence
+   * {{{
+   * b = \log{P(1) / P(0)} = \log{count_1 / count_0}
+   * }}}
*/
-  initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(
-histogram(1) / histogram(0))
+  initialCoefficientsWithIntercept.toArray(numFeatures)
+  = math.log(histogram(1) / histogram(0))
 }
 
 val states = optimizer.iterations(new CachedDiffFunction(costFun),
   initialCoefficientsWithIntercept.toBreeze.toDenseVector)
 
-/*
-   Note that in Logistic Regression, the objective history (loss + 
regularization)
-   is log-likelihood which is invariance under feature 
standardization. As a result,
-   the objective history from optimizer is the same as the one in 
the original space.
+/**
+ * Note that in Logistic Regression, the objective history (loss + 
regularization)
--- End diff --

reverse the style change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369730
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -374,11 +395,11 @@ class LogisticRegression @Since("1.2.0") (
   throw new SparkException(msg)
 }
 
-/*
-   The coefficients are trained in the scaled space; we're 
converting them back to
-   the original space.
-   Note that the intercept in scaled space and original space is 
the same;
-   as a result, no scaling is needed.
+/**
+ * The coefficients are trained in the scaled space; we're 
converting them back to
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50372397
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
--- End diff --

its used on L348 in the log warning


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173506673
  
**[Test build #49871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49871/consoleFull)**
 for PR 10788 at commit 
[`46ae406`](https://github.com/apache/spark/commit/46ae406e7d9935ba2d75a092e98622578fb4ce15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50372852
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
--- End diff --

Good point, in a previous version of the code we passed handlePersistence 
down through to avoid this. I've updated it to do the same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50370169
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
--- End diff --

Will this cause double caching? Let's say input RDD is cached, so 
`handlePersistence` will be false. As a result, `df == StorageLevel.NONE` will 
be true in ml's LOR code, and this will cause caching twice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50370273
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
+// convert the model
+val weights = mlLogisticRegresionModel.weights match {
--- End diff --

```scala
val weights = Vectors.dense(mlLogisticRegresionModel.coefficients.toArray)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173519318
  
**[Test build #49871 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49871/consoleFull)**
 for PR 10788 at commit 
[`46ae406`](https://github.com/apache/spark/commit/46ae406e7d9935ba2d75a092e98622578fb4ce15).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173519472
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49871/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50370414
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
+// convert the model
+val weights = mlLogisticRegresionModel.weights match {
+  case x: DenseVector => x
+  case y: Vector => Vectors.dense(y.toArray)
+}
+createModel(weights, mlLogisticRegresionModel.intercept)
+  }
+  optimizer.getUpdater() match {
--- End diff --

when `optimizer.getRegParam() == 0.0`, run the old version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173493541
  
LGTM except some styling issues, and concern about caching twice. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50371017
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   *
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val uid = Identifiable.randomUID("logreg-static")
+  lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(
+uid, initialWeights, 1.0))
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
+// convert the model
+val weights = mlLogisticRegresionModel.weights match {
+  case x: DenseVector => x
+  case y: Vector => Vectors.dense(y.toArray)
+}
+createModel(weights, mlLogisticRegresionModel.intercept)
+  }
+  optimizer.getUpdater() match {
--- End diff --

okay, this will make the test harder to write. I don't care this one now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50372566
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
--- End diff --

Ok, looking at the rest of the comments in the file & the style guide it 
seems to mostly have the `*` but I'll put them back in (it also break auto 
indent to not have them but thats an emacs bug)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-173519471
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369078
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
--- End diff --

How can this compile? Should be `optInitialModel.get.coefficients.size != 
numFeatures`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369207
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369141
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
--- End diff --

`vec` is not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369668
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
--- End diff --

I think u have to remove all the `*`. I think we decide to do comment like

```
/*
Start the sentence.
 */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369552
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
--- End diff --

remove the extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50369516
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -335,31 +342,45 @@ class LogisticRegression @Since("1.2.0") (
 val initialCoefficientsWithIntercept =
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else 
numFeatures)
 
-if ($(fitIntercept)) {
-  /*
- For binary logistic regression, when we initialize the 
coefficients as zeros,
- it will converge faster if we initialize the intercept such 
that
- it follows the distribution of the labels.
-
- {{{
-   P(0) = 1 / (1 + \exp(b)), and
-   P(1) = \exp(b) / (1 + \exp(b))
- }}}, hence
- {{{
-   b = \log{P(1) / P(0)} = \log{count_1 / count_0}
- }}}
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
!= numFeatures) {
+  val vec = optInitialModel.get.coefficients
+  logWarning(
+s"Initial coefficients provided ${vec} did not match the 
expected size ${numFeatures}")
+}
+
+if (optInitialModel.isDefined && optInitialModel.get.coefficients 
== numFeatures) {
+  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
+  optInitialModel.get.coefficients.foreachActive { case (index, 
value) =>
+initialCoefficientsWithInterceptArray(index) = value
+  }
+  if ($(fitIntercept)) {
+initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
+  }
+} else if ($(fitIntercept)) {
+  /**
+   * For binary logistic regression, when we initialize the 
coefficients as zeros,
+   * it will converge faster if we initialize the intercept such 
that
+   * it follows the distribution of the labels.
+
+   * {{{
+   * P(0) = 1 / (1 + \exp(b)), and
+   * P(1) = \exp(b) / (1 + \exp(b))
+   * }}}, hence
+   * {{{
+   * b = \log{P(1) / P(0)} = \log{count_1 / count_0}
+   * }}}
*/
-  initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(
-histogram(1) / histogram(0))
+  initialCoefficientsWithIntercept.toArray(numFeatures)
+  = math.log(histogram(1) / histogram(0))
--- End diff --

add two spaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172978875
  
**[Test build #49697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49697/consoleFull)**
 for PR 10788 at commit 
[`7501b4b`](https://github.com/apache/spark/commit/7501b4b29d0d08d1363cb1f16be1397887a569b1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172979036
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172979039
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49697/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172972956
  
**[Test build #49692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49692/consoleFull)**
 for PR 10788 at commit 
[`e1b0389`](https://github.com/apache/spark/commit/e1b038926b7506cfa240883ae177785a24cc9870).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172957321
  
**[Test build #49692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49692/consoleFull)**
 for PR 10788 at commit 
[`e1b0389`](https://github.com/apache/spark/commit/e1b038926b7506cfa240883ae177785a24cc9870).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172968912
  
**[Test build #49697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49697/consoleFull)**
 for PR 10788 at commit 
[`7501b4b`](https://github.com/apache/spark/commit/7501b4b29d0d08d1363cb1f16be1397887a569b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172973109
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172973112
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49692/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172968528
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172968530
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49696/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172459633
  
**[Test build #49586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49586/consoleFull)**
 for PR 10788 at commit 
[`43a3a32`](https://github.com/apache/spark/commit/43a3a3246f793d467751f40b4dceba6ccaed394b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172468869
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49586/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172468731
  
**[Test build #49586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49586/consoleFull)**
 for PR 10788 at commit 
[`43a3a32`](https://github.com/apache/spark/commit/43a3a3246f793d467751f40b4dceba6ccaed394b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172468868
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50040773
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +329,25 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
--- End diff --

Is this used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50041495
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,85 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, userSuppliedWeights = true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+lr.setStandardization(useFeatureScaling)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
--- End diff --

This is not used anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50041152
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -343,8 +365,8 @@ class LogisticRegression @Since("1.2.0") (
 = math.log(histogram(1) / histogram(0))
 }
 
-val states = optimizer.iterations(new CachedDiffFunction(costFun),
-  initialCoefficientsWithIntercept.toBreeze.toDenseVector)
+  val states = optimizer.iterations(new CachedDiffFunction(costFun),
+initialCoefficientsWithIntercept.toBreeze.toDenseVector)
--- End diff --

Wrong indentation. Remove two spaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50041320
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,85 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
--- End diff --

Removed `starting from the initial weights provided.` and add extra new 
line here for readability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-18 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r50041364
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +384,85 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), userSuppliedWeights = false)
+  }
+
+  /**
+   * Run Logistic Regression with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
--- End diff --

Add extra new line before `If a known updater is...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962408
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
--- End diff --

Replace `algorithm` by `Logistic Regression`, and remove `starting from the 
initial weights provided`.

Add a new line between `of LabeledPoint entries` and `If a known updater is 
used`.

Actually, in ml version, disabling feature scaling is supported now. So 
please call ml implementation in this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962416
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
--- End diff --

`run(input, generateInitialWeights(input), userSuppliedWeights = false)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49963912
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +329,11 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
-val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
+val initialCoefficientsWithIntercept = 
optInitialCoefficients.getOrElse(
+  Vectors.zeros(numFeaturesWithIntercept))
 
--- End diff --

here, 

```scala
val initialCoefficientsWithIntercept =  
   Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)

if (optInitialModel.isDefined && optInitialModel.get.coefficients.size == 
numFeatures) {
  val initialCoefficientsWithInterceptArray = 
initialCoefficientsWithIntercept.toArray
  optInitialModel.get.coefficients.foreachActive { case (index, value) =>
initialCoefficientsWithInterceptArray(index) = value
  }
  if ($(fitIntercept) {
  initialCoefficientsWithInterceptArray(numFeatures) == 
optInitialModel.get.intercept
  } 
} else if ($(fitIntercept)) {
  /*
 For binary logistic regression, when we initialize the 
coefficients as zeros,
 it will converge faster if we initialize the intercept such that
 it follows the distribution of the labels.
 {{{
 P(0) = 1 / (1 + \exp(b)), and
 P(1) = \exp(b) / (1 + \exp(b))
 }}}, hence
 {{{
 b = \log{P(1) / P(0)} = \log{count_1 / count_0}
 }}}
   */
  initialCoefficientsWithIntercept.toArray(numFeatures)
= math.log(histogram(1) / histogram(0))
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962448
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
--- End diff --

`run(input, generateInitialWeights(input), userSuppliedWeights = true)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962512
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
--- End diff --

Replace `algorithm` by `Logistic Regression`, and add a new line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172427291
  
**[Test build #49568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49568/consoleFull)**
 for PR 10788 at commit 
[`4caab8c`](https://github.com/apache/spark/commit/4caab8ca2ac23f24fe84cf741ab2c013e319752d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962541
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
--- End diff --

You can remove `useFeatureScaling`, and pass it as setStandardization in ML 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172427338
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172427339
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49568/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962990
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
+appendBias(initialWeights)
+  } else {
+initialWeights
+  }
+  lr.setInitialWeights(initialWeightsWithIntercept)
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
--- End diff --

same


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49961906
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,10 +247,30 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialWeights: Option[Vector] = None
+  /** @group setParam */
+  private[spark] def setInitialWeights(value: Vector): this.type = {
+this.optInitialWeights = Some(value)
+this
+  }
--- End diff --

So we have setInitialWeights on StreamingLogisticRegressionWithSGD - would 
it be better to have it match StreamingLogisticRegressionWithSGD ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49961903
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,10 +247,31 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialCoefficients: Option[Vector] = None
+  /** @group setParam */
+  private[spark] def setInitialWeights(value: Vector): this.type = {
+this.optInitialCoefficients = Some(value)
+this
+  }
+
+  /**
+   * Validate the initial weights, return an Option, if not the expected 
size return None
+   * and log a warning.
+   */
+  private def validateWeights(vectorOpt: Option[Vector], numFeatures: 
Int): Option[Vector] = {
+vectorOpt.flatMap(vec =>
+  if (vec.size == numFeatures) {
+Some(vec)
+  } else {
+logWarning(
+  s"""Initial weights provided (${vec})did not match the expected 
size ${numFeatures}""")
--- End diff --

btw, why `s"""`, also change `weights` to coefficients


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49961912
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,10 +247,31 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialCoefficients: Option[Vector] = None
+  /** @group setParam */
+  private[spark] def setInitialWeights(value: Vector): this.type = {
+this.optInitialCoefficients = Some(value)
+this
+  }
+
+  /**
+   * Validate the initial weights, return an Option, if not the expected 
size return None
+   * and log a warning.
+   */
+  private def validateWeights(vectorOpt: Option[Vector], numFeatures: 
Int): Option[Vector] = {
--- End diff --

validateCoefficients


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172430774
  
**[Test build #49570 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49570/consoleFull)**
 for PR 10788 at commit 
[`0e2ea49`](https://github.com/apache/spark/commit/0e2ea495ad0020f89df9e70653ff380673d3563e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49963463
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,8 +247,15 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialCoefficients: Option[Vector] = None
+
+  /** @group setParam */
+  private[spark] def setInitialModel(model: LogisticRegressionModel): 
this.type = {
+this.optInitialCoefficients = Some(model.coefficients)
--- End diff --

You don't want to lose the information of intercept in model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49963455
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,8 +247,15 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialCoefficients: Option[Vector] = None
--- End diff --

Keep the reference to `InitialModel` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172442231
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172442232
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49571/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172442212
  
**[Test build #49571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49571/consoleFull)**
 for PR 10788 at commit 
[`67f`](https://github.com/apache/spark/commit/67f2b9d22ddb0e8c8391d5c744b8895e91e4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49961714
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,10 +247,30 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialWeights: Option[Vector] = None
+  /** @group setParam */
+  private[spark] def setInitialWeights(value: Vector): this.type = {
+this.optInitialWeights = Some(value)
+this
+  }
--- End diff --

How about we follow https://github.com/apache/spark/pull/8972 , and have 
the following code. We can create another seprate JIRA for moving 
`setInitialModel` to public with a sharedParam.

```scala
  private var initialModel: Option[LogisticRegressionModel] = None

  private def setInitialModel(model: LogisticRegressionModel): this.type = {
...
...
this
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962987
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
+appendBias(initialWeights)
+  } else {
+initialWeights
+  }
+  lr.setInitialWeights(initialWeightsWithIntercept)
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
--- End diff --

So the ML code checks on the DataFrame - which will never be cached. So we 
check on the user supplied input and if the user supplied input is not 
persisted we handle our own persistance but if the user supplied input is 
persisted then we don't.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962964
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
--- End diff --

haha... yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962944
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +343,12 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
-val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
+val userSuppliedCoefficients = validateWeights(optInitialCoefficients, 
numFeaturesWithIntercept)
--- End diff --

You will know # of features by the size of coefficients set by 
setInitialModel. There is no ambiguity here since it's binary, and intercept 
has a separate variable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49963357
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
+appendBias(initialWeights)
+  } else {
+initialWeights
+  }
+  lr.setInitialWeights(initialWeightsWithIntercept)
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
--- End diff --

that makes sense. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962157
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +343,12 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
-val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
+val userSuppliedCoefficients = validateWeights(optInitialCoefficients, 
numFeaturesWithIntercept)
--- End diff --

let's handle it through setInitialModel, and have another PR to make it 
public. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962716
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
+appendBias(initialWeights)
+  } else {
+initialWeights
+  }
+  lr.setInitialWeights(initialWeightsWithIntercept)
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
--- End diff --

Why do we need to do it? I through those check is already in ML code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962722
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
+appendBias(initialWeights)
+  } else {
+initialWeights
+  }
+  lr.setInitialWeights(initialWeightsWithIntercept)
+}
+lr.setFitIntercept(addIntercept)
+lr.setMaxIter(optimizer.getNumIterations())
+lr.setTol(optimizer.getConvergenceTol())
+// Convert our input into a DataFrame
+val sqlContext = new SQLContext(input.context)
+import sqlContext.implicits._
+val df = input.toDF()
+// Determine if we should cache the DF
+val handlePersistence = input.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  df.persist(StorageLevel.MEMORY_AND_DISK)
+}
+// Train our model
+val mlLogisticRegresionModel = lr.train(df)
+// unpersist if we persisted
+if (handlePersistence) {
+  df.unpersist()
+}
--- End diff --

ditto?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962737
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +343,12 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
-val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
+val userSuppliedCoefficients = validateWeights(optInitialCoefficients, 
numFeaturesWithIntercept)
--- End diff --

I don't think we know the number of features at that point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172431217
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49569/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172431216
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172437627
  
**[Test build #49571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49571/consoleFull)**
 for PR 10788 at commit 
[`67f`](https://github.com/apache/spark/commit/67f2b9d22ddb0e8c8391d5c744b8895e91e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172421474
  
**[Test build #49568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49568/consoleFull)**
 for PR 10788 at commit 
[`4caab8c`](https://github.com/apache/spark/commit/4caab8ca2ac23f24fe84cf741ab2c013e319752d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49961964
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -247,10 +247,31 @@ class LogisticRegression @Since("1.2.0") (
   @Since("1.5.0")
   override def getThresholds: Array[Double] = super.getThresholds
 
-  override protected def train(dataset: DataFrame): 
LogisticRegressionModel = {
-// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+  private var optInitialCoefficients: Option[Vector] = None
+  /** @group setParam */
+  private[spark] def setInitialWeights(value: Vector): this.type = {
+this.optInitialCoefficients = Some(value)
+this
+  }
+
+  /**
+   * Validate the initial weights, return an Option, if not the expected 
size return None
+   * and log a warning.
+   */
+  private def validateWeights(vectorOpt: Option[Vector], numFeatures: 
Int): Option[Vector] = {
+vectorOpt.flatMap(vec =>
+  if (vec.size == numFeatures) {
+Some(vec)
+  } else {
+logWarning(
+  s"""Initial weights provided (${vec})did not match the expected 
size ${numFeatures}""")
+None
+  })
+  }
+
+  override protected[spark] def train(dataset: DataFrame): 
LogisticRegressionModel = {
 val w = if ($(weightCol).isEmpty) lit(1.0) else col($(weightCol))
-val instances: RDD[Instance] = dataset.select(col($(labelCol)), w, 
col($(featuresCol))).map {
+val instances = dataset.select(col($(labelCol)), w, 
col($(featuresCol))).map {
--- End diff --

why this line is changed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49962879
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
--- End diff --

I'm assuming you meant `run(input, input, userSuppliedWeights = true)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49963160
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +343,12 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
-val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
+val userSuppliedCoefficients = validateWeights(optInitialCoefficients, 
numFeaturesWithIntercept)
--- End diff --

Ah then there is no validation step we just assume if they set the initial 
model they set a valid initial model. Ok :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49964184
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -322,10 +329,11 @@ class LogisticRegression @Since("1.2.0") (
   new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, 
$(tol))
 }
 
-val initialCoefficientsWithIntercept =
-  Vectors.zeros(if ($(fitIntercept)) numFeatures + 1 else numFeatures)
+val numFeaturesWithIntercept = if ($(fitIntercept)) numFeatures + 1 
else numFeatures
+val initialCoefficientsWithIntercept = 
optInitialCoefficients.getOrElse(
+  Vectors.zeros(numFeaturesWithIntercept))
 
--- End diff --

btw, may we want to log. `if (optInitialModel.isDefined && 
optInitialModel.get.coefficients.size != numFeatures)`, let's log it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172434715
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49570/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10788#discussion_r49961836
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -374,4 +383,82 @@ class LogisticRegressionWithLBFGS
   new LogisticRegressionModel(weights, intercept, numFeatures, 
numOfLinearPredictor + 1)
 }
   }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * If using ml implementation, uses ml code to generate initial weights.
+   */
+  override def run(input: RDD[LabeledPoint]): LogisticRegressionModel = {
+run(input, generateInitialWeights(input), false)
+  }
+
+  /**
+   * Run the algorithm with the configured parameters on an input RDD
+   * of LabeledPoint entries starting from the initial weights provided.
+   * If a known updater is used calls the ml implementation, to avoid
+   * applying a regularization penalty to the intercept, otherwise
+   * defaults to the mllib implementation. If more than two classes
+   * or feature scaling is disabled, always uses mllib implementation.
+   * Uses user provided weights.
+   */
+  override def run(input: RDD[LabeledPoint], initialWeights: Vector): 
LogisticRegressionModel = {
+run(input, initialWeights, true)
+  }
+
+  private def run(input: RDD[LabeledPoint], initialWeights: Vector, 
userSuppliedWeights: Boolean):
+  LogisticRegressionModel = {
+// ml's Logisitic regression only supports binary classifcation 
currently.
+if (numOfLinearPredictor == 1 && useFeatureScaling) {
+  def runWithMlLogisitcRegression(elasticNetParam: Double) = {
+// Prepare the ml LogisticRegression based on our settings
+val lr = new 
org.apache.spark.ml.classification.LogisticRegression()
+lr.setRegParam(optimizer.getRegParam())
+lr.setElasticNetParam(elasticNetParam)
+if (userSuppliedWeights) {
+  val initialWeightsWithIntercept = if (addIntercept) {
+appendBias(initialWeights)
+  } else {
+initialWeights
+  }
+  lr.setInitialWeights(initialWeightsWithIntercept)
--- End diff --

Here will be

```scala
lr.setInitialModel(new 
org.apache.spark.ml.classification.LogisticRegressionModel(uid, initialWeights, 
1))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172434712
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7780][MLLIB] intercept in logisticregre...

2016-01-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10788#issuecomment-172434615
  
**[Test build #49570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49570/consoleFull)**
 for PR 10788 at commit 
[`0e2ea49`](https://github.com/apache/spark/commit/0e2ea495ad0020f89df9e70653ff380673d3563e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >