[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119748738
  
@holdenk I assigned https://issues.apache.org/jira/browse/SPARK-8913 to 
you. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7080


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119747807
  
I could tackle the test cleanup since I've got a related PR anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r34204113
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -232,16 +236,26 @@ class LogisticRegressionSuite extends SparkFunSuite 
with MLlibTestSparkContext {
 val interceptR = 2.8366423
 val weightsR = Array(-0.5895848, 0.8931147, -0.3925051, -0.7996864)
 
-assert(model.intercept ~== interceptR relTol 1E-3)
-assert(model.weights(0) ~== weightsR(0) relTol 1E-3)
-assert(model.weights(1) ~== weightsR(1) relTol 1E-3)
-assert(model.weights(2) ~== weightsR(2) relTol 1E-3)
-assert(model.weights(3) ~== weightsR(3) relTol 1E-3)
+assert(model1.intercept ~== interceptR relTol 1E-3)
+assert(model1.weights(0) ~== weightsR(0) relTol 1E-3)
--- End diff --

BTW, this will be tricky with L1 since some of the coefficients can be 
zero. We can workaround this with `absTol`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119735922
  
@mengxr Let's merge it first, and I will find someone to work on this this 
weekend. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119724767
  
@dbtsai LGTM except that we could simplify the test code by a lot. We can 
either merge it and address it in a follow-up PR (and one for the model 
formulation), or continue improving this one. Which do you prefer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-08 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r34195242
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -232,16 +236,26 @@ class LogisticRegressionSuite extends SparkFunSuite 
with MLlibTestSparkContext {
 val interceptR = 2.8366423
 val weightsR = Array(-0.5895848, 0.8931147, -0.3925051, -0.7996864)
 
-assert(model.intercept ~== interceptR relTol 1E-3)
-assert(model.weights(0) ~== weightsR(0) relTol 1E-3)
-assert(model.weights(1) ~== weightsR(1) relTol 1E-3)
-assert(model.weights(2) ~== weightsR(2) relTol 1E-3)
-assert(model.weights(3) ~== weightsR(3) relTol 1E-3)
+assert(model1.intercept ~== interceptR relTol 1E-3)
+assert(model1.weights(0) ~== weightsR(0) relTol 1E-3)
--- End diff --

For the tests, it might be simpler to make `weightR` a `DenseVector` then 
use assert(model1.weights ~== weightsR relTol ...)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119443905
  
  [Test build #36754 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36754/console)
 for   PR 7080 at commit 
[`877e6c7`](https://github.com/apache/spark/commit/877e6c777874c3684c62b94a8704c8a1866defd9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119444002
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119411086
  
  [Test build #36754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36754/consoleFull)
 for   PR 7080 at commit 
[`877e6c7`](https://github.com/apache/spark/commit/877e6c777874c3684c62b94a8704c8a1866defd9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119411025
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119411012
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-07 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-119410885
  
BTW: Will have more documentation in the followup PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-118171388
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-118171249
  
  [Test build #36424 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36424/console)
 for   PR 7080 at commit 
[`ed4e14a`](https://github.com/apache/spark/commit/ed4e14a6708f4384408e572884fc78244caeb0f2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-118134622
  
  [Test build #36424 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36424/consoleFull)
 for   PR 7080 at commit 
[`ed4e14a`](https://github.com/apache/spark/commit/ed4e14a6708f4384408e572884fc78244caeb0f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-118134102
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-118134008
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33803828
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String)
   def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
   setDefault(fitIntercept -> true)
 
+  /**
+   * Whether to standardize the training features prior to fitting the 
model sequence.
--- End diff --

@dbtsai There may be an issue if we copy some code/text from R, which is 
GPL-licensed. Let's rephrase this sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117988343
  
  [Test build #36366 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36366/console)
 for   PR 7080 at commit 
[`a291cb6`](https://github.com/apache/spark/commit/a291cb679e3286f6486083ef9ab9666144d58e8a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117988519
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117948341
  
  [Test build #36366 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36366/consoleFull)
 for   PR 7080 at commit 
[`a291cb6`](https://github.com/apache/spark/commit/a291cb679e3286f6486083ef9ab9666144d58e8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117947974
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117948001
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33752237
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -292,20 +317,51 @@ class LogisticRegressionSuite extends SparkFunSuite 
with MLlibTestSparkContext {
  * data.V4 -0.04325749
  * data.V5 -0.02481551
  */
-val interceptR = -0.05627428
-val weightsR = Array(0.0, 0.0, -0.04325749, -0.02481551)
-
-assert(model.intercept ~== interceptR relTol 1E-2)
-assert(model.weights(0) ~== weightsR(0) relTol 1E-3)
-assert(model.weights(1) ~== weightsR(1) relTol 1E-3)
-assert(model.weights(2) ~== weightsR(2) relTol 1E-2)
-assert(model.weights(3) ~== weightsR(3) relTol 2E-2)
+val interceptR1 = -0.05627428
+val weightsR1 = Array(0.0, 0.0, -0.04325749, -0.02481551)
+
+assert(model1.intercept ~== interceptR1 relTol 1E-2)
+assert(model1.weights(0) ~== weightsR1(0) absTol 1E-3)
+assert(model1.weights(1) ~== weightsR1(1) absTol 1E-3)
+assert(model1.weights(2) ~== weightsR1(2) relTol 1E-2)
+assert(model1.weights(3) ~== weightsR1(3) relTol 2E-2)
+
+/**
+ * Using the following R code to load the data and train the model 
using glmnet package.
--- End diff --

well. this is big conflict. will update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33749927
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String)
   def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
   setDefault(fitIntercept -> true)
 
+  /**
+   * Whether to standardize the training features prior to fitting the 
model sequence.
--- End diff --

This is copied from R's description. I think it's about fitting a sequence 
of models with different regularization. I will modify it to `to fitting the 
model`.

```R
standardize 
Logical flag for x variable standardization, prior to fitting the model 
sequence. 
The coefficients are always returned on the original scale. Default is 
standardize=TRUE. 
If variables are in the same units already, you might not wish to 
standardize. 
See details below for y standardization with family="gaussian".
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33739057
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -534,27 +556,41 @@ private class LogisticCostFun(
   case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
 })
 
-// regVal is the sum of weight squares for L2 regularization
-val norm = if (regParamL2 == 0.0) {
-  0.0
-} else if (fitIntercept) {
-  brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size 
-1)).toBreeze, 2.0)
-} else {
-  brzNorm(weights, 2.0)
-}
-val regVal = 0.5 * regParamL2 * norm * norm
+val totalGradientArray = logisticAggregator.gradient.toArray
 
-val loss = logisticAggregator.loss + regVal
-val gradient = logisticAggregator.gradient
-
-if (fitIntercept) {
-  val wArray = w.toArray.clone()
-  wArray(wArray.length - 1) = 0.0
-  axpy(regParamL2, Vectors.dense(wArray), gradient)
+// regVal is the sum of weight squares excluding intercept for L2 
regularization.
+val regVal = if (regParamL2 == 0.0) {
+  0.0
 } else {
-  axpy(regParamL2, w, gradient)
+  var sum = 0.0
+  w.foreachActive { (index, value) =>
+// If `fitIntercept` is true, the last term which is intercept 
doesn't
+// contribute to the regularization.
+if (index != numFeatures) {
+  // The following code will compute the loss of the 
regularization; also
+  // the gradient of the regularization, and add back to 
totalGradientArray.
+  sum += {
+if (standardization) {
+  totalGradientArray(index) += regParamL2 * value
+  value * value
+} else {
+  if (featuresStd(index) != 0.0) {
+// If `standardization` is false, we still standardize the 
data
+// to improve the rate of convergence; as a result, we 
have to
+// perform this reverse standardization by penalizing each 
component
+// differently to get effectively the same objective 
function when
+// the training dataset is not standardized.
--- End diff --

We should check R's implementation and discuss whether this is what users 
expect if they set `standardization` to false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33739015
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -292,20 +317,51 @@ class LogisticRegressionSuite extends SparkFunSuite 
with MLlibTestSparkContext {
  * data.V4 -0.04325749
  * data.V5 -0.02481551
  */
-val interceptR = -0.05627428
-val weightsR = Array(0.0, 0.0, -0.04325749, -0.02481551)
-
-assert(model.intercept ~== interceptR relTol 1E-2)
-assert(model.weights(0) ~== weightsR(0) relTol 1E-3)
-assert(model.weights(1) ~== weightsR(1) relTol 1E-3)
-assert(model.weights(2) ~== weightsR(2) relTol 1E-2)
-assert(model.weights(3) ~== weightsR(3) relTol 2E-2)
+val interceptR1 = -0.05627428
+val weightsR1 = Array(0.0, 0.0, -0.04325749, -0.02481551)
+
+assert(model1.intercept ~== interceptR1 relTol 1E-2)
+assert(model1.weights(0) ~== weightsR1(0) absTol 1E-3)
+assert(model1.weights(1) ~== weightsR1(1) absTol 1E-3)
+assert(model1.weights(2) ~== weightsR1(2) relTol 1E-2)
+assert(model1.weights(3) ~== weightsR1(3) relTol 2E-2)
+
+/**
+ * Using the following R code to load the data and train the model 
using glmnet package.
--- End diff --

We recently updated the R comment to make it easier to copy & paste. Could 
you remove the leading `*` and `>` from the comments? Also use `/*` instead of 
`/**`. The latter is for JavaDoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33739010
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String)
   def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
   setDefault(fitIntercept -> true)
 
+  /**
+   * Whether to standardize the training features prior to fitting the 
model sequence.
--- End diff --

What do you mean by `model sequence`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33739011
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -149,15 +158,28 @@ class LogisticRegression(override val uid: String)
 val regParamL1 = $(elasticNetParam) * $(regParam)
 val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam)
 
-val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
+val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept), $(standardization),
   featuresStd, featuresMean, regParamL2)
 
 val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 0.0) {
   new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol))
 } else {
-  // Remove the L1 penalization on the intercept
   def regParamL1Fun = (index: Int) => {
-if (index == numFeatures) 0.0 else regParamL1
+// Remove the L1 penalization on the intercept
+if (index == numFeatures) 0.0 else {
--- End diff --

We either put `if ... else ...` in a single line or use `{..}` for both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117859864
  
nm, @mengxr  is taking a look instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33738207
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String)
   def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
   setDefault(fitIntercept -> true)
 
+  /**
+   * Whether to standardize the training features prior to fitting the 
model sequence.
--- End diff --

"model sequence" may not be understood since we don't provide a sequence 
currently; how about just saying "model?"

Also, to make it clear what is meant by standardizing, how about having a 
link to StandardScaler.withStd?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117859275
  
I'll take a look now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117076337
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117075566
  
  [Test build #36113 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36113/console)
 for   PR 7080 at commit 
[`28c807f`](https://github.com/apache/spark/commit/28c807fd5af7abe5220036ccc3310c5d219a3744).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117063017
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117062924
  
  [Test build #36110 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36110/console)
 for   PR 7080 at commit 
[`8d2d6b7`](https://github.com/apache/spark/commit/8d2d6b796e48721fc1b315bc1b9f1de61d6de714).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117033423
  
  [Test build #36113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36113/consoleFull)
 for   PR 7080 at commit 
[`28c807f`](https://github.com/apache/spark/commit/28c807fd5af7abe5220036ccc3310c5d219a3744).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117031969
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117031902
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117023138
  
  [Test build #36110 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36110/consoleFull)
 for   PR 7080 at commit 
[`8d2d6b7`](https://github.com/apache/spark/commit/8d2d6b796e48721fc1b315bc1b9f1de61d6de714).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117022970
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117022982
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117016916
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117016901
  
  [Test build #36103 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36103/console)
 for   PR 7080 at commit 
[`cf2569b`](https://github.com/apache/spark/commit/cf2569b7636ee6984b94944262772c694cc6847b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117015646
  
  [Test build #36103 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36103/consoleFull)
 for   PR 7080 at commit 
[`cf2569b`](https://github.com/apache/spark/commit/cf2569b7636ee6984b94944262772c694cc6847b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117015057
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117015008
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-117001396
  
Jenkins, please test again.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116955865
  
Jenkins, please test again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116955166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116952789
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116952772
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33532181
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -534,27 +554,39 @@ private class LogisticCostFun(
   case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
 })
 
-// regVal is the sum of weight squares for L2 regularization
-val norm = if (regParamL2 == 0.0) {
-  0.0
-} else if (fitIntercept) {
-  brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size 
-1)).toBreeze, 2.0)
-} else {
-  brzNorm(weights, 2.0)
-}
-val regVal = 0.5 * regParamL2 * norm * norm
+val totalGradientArray = logisticAggregator.gradient.toArray
 
-val loss = logisticAggregator.loss + regVal
-val gradient = logisticAggregator.gradient
-
-if (fitIntercept) {
-  val wArray = w.toArray.clone()
-  wArray(wArray.length - 1) = 0.0
-  axpy(regParamL2, Vectors.dense(wArray), gradient)
+// regVal is the sum of weight squares excluding intercept for L2 
regularization.
+val regVal = if (regParamL2 == 0.0) {
+  0.0
 } else {
-  axpy(regParamL2, w, gradient)
--- End diff --

Makes sense, thanks :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33532097
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -208,8 +208,11 @@ class LogisticRegressionSuite extends SparkFunSuite 
with MLlibTestSparkContext {
   }
 
   test("binary logistic regression with intercept without regularization") 
{
-val trainer = (new LogisticRegression).setFitIntercept(true)
-val model = trainer.fit(binaryDataset)
+val trainer1 = (new LogisticRegression).setFitIntercept(true)
--- End diff --

Make sense, and I will add another test testing `standardization == true` 
is default one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33532011
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -534,27 +554,39 @@ private class LogisticCostFun(
   case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
 })
 
-// regVal is the sum of weight squares for L2 regularization
-val norm = if (regParamL2 == 0.0) {
-  0.0
-} else if (fitIntercept) {
-  brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size 
-1)).toBreeze, 2.0)
-} else {
-  brzNorm(weights, 2.0)
-}
-val regVal = 0.5 * regParamL2 * norm * norm
+val totalGradientArray = logisticAggregator.gradient.toArray
 
-val loss = logisticAggregator.loss + regVal
-val gradient = logisticAggregator.gradient
-
-if (fitIntercept) {
-  val wArray = w.toArray.clone()
-  wArray(wArray.length - 1) = 0.0
-  axpy(regParamL2, Vectors.dense(wArray), gradient)
+// regVal is the sum of weight squares excluding intercept for L2 
regularization.
+val regVal = if (regParamL2 == 0.0) {
+  0.0
 } else {
-  axpy(regParamL2, w, gradient)
--- End diff --

I think it will be even faster than the previous one since I don't have a 
copy of wArray anymore, and I compute the gradient and norm in the same pass 
while previously, we looped through the array twice. Since it's axpy, the BLAS 
will not gain too much performance.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread holdenk
Github user holdenk commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116850826
  
Looks good to me with a few minor notes :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33505545
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -534,27 +554,39 @@ private class LogisticCostFun(
   case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
 })
 
-// regVal is the sum of weight squares for L2 regularization
-val norm = if (regParamL2 == 0.0) {
-  0.0
-} else if (fitIntercept) {
-  brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size 
-1)).toBreeze, 2.0)
-} else {
-  brzNorm(weights, 2.0)
-}
-val regVal = 0.5 * regParamL2 * norm * norm
+val totalGradientArray = logisticAggregator.gradient.toArray
 
-val loss = logisticAggregator.loss + regVal
-val gradient = logisticAggregator.gradient
-
-if (fitIntercept) {
-  val wArray = w.toArray.clone()
-  wArray(wArray.length - 1) = 0.0
-  axpy(regParamL2, Vectors.dense(wArray), gradient)
+// regVal is the sum of weight squares excluding intercept for L2 
regularization.
+val regVal = if (regParamL2 == 0.0) {
+  0.0
 } else {
-  axpy(regParamL2, w, gradient)
--- End diff --

Should we keep the previous code for computing the norm/regVal if we are 
performing standardization? Since the bzNorm, axpy, etc. functions are part of 
BLAS they are probably faster than the loop we have?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33505552
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -208,8 +208,11 @@ class LogisticRegressionSuite extends SparkFunSuite 
with MLlibTestSparkContext {
   }
 
   test("binary logistic regression with intercept without regularization") 
{
-val trainer = (new LogisticRegression).setFitIntercept(true)
-val model = trainer.fit(binaryDataset)
+val trainer1 = (new LogisticRegression).setFitIntercept(true)
--- End diff --

For consistency with how the tests for setFitIntercept were done, maybe 
have setStandardization(true) here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33505540
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -149,15 +158,26 @@ class LogisticRegression(override val uid: String)
 val regParamL1 = $(elasticNetParam) * $(regParam)
 val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam)
 
-val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
+val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept), $(standardization),
   featuresStd, featuresMean, regParamL2)
 
 val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 0.0) {
   new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol))
 } else {
-  // Remove the L1 penalization on the intercept
   def regParamL1Fun = (index: Int) => {
-if (index == numFeatures) 0.0 else regParamL1
+// Remove the L1 penalization on the intercept
+if (index == numFeatures) 0.0 else {
+  if ($(standardization)) {
+regParamL1
+  } else {
+if (featuresStd(index) != 0.0) {
+  // If `standardization` is false, each component is 
penalized differently to
--- End diff --

Maybe mention that the data is standardized, which is why we are performing 
this reverse standardization for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116638244
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116638099
  
  [Test build #35976 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35976/console)
 for   PR 7080 at commit 
[`588c75f`](https://github.com/apache/spark/commit/588c75f714372b6da4dd20fa7d006afe399fa8e2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116591832
  
  [Test build #35976 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35976/consoleFull)
 for   PR 7080 at commit 
[`588c75f`](https://github.com/apache/spark/commit/588c75f714372b6da4dd20fa7d006afe399fa8e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116590952
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7080#issuecomment-116515950
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-06-29 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/7080

[SPARK-8700][ML] Disable feature scaling in Logistic Regression

All compressed sensing applications, and some of the regression use-cases 
will have better result by turning the feature scaling off. However, if we 
implement this naively by training the dataset without doing any 
standardization, the rate of convergency will not be good. This can be 
implemented by still standardizing the training dataset but we penalize each 
component differently to get effectively the same objective function but a 
better numerical problem. As a result, for those columns with high variances, 
they will be penalized less, and vice versa. Without this, since all the 
features are standardized, so they will be penalized the same.

In R, there is an option for this.
`standardize`   
Logical flag for x variable standardization, prior to fitting the model 
sequence. The coefficients are always returned on the original scale. Default 
is standardize=TRUE. If variables are in the same units already, you might not 
wish to standardize. See details below for y standardization with 
family="gaussian".

+cc @holdenk @mengxr @jkbradley 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark lors

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7080


commit 588c75f714372b6da4dd20fa7d006afe399fa8e2
Author: DB Tsai 
Date:   2015-06-24T01:06:03Z

first commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org