[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119748738 @holdenk I assigned https://issues.apache.org/jira/browse/SPARK-8913 to you. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7080 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119747807 I could tackle the test cleanup since I've got a related PR anyways. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r34204113 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -232,16 +236,26 @@ class LogisticRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { val interceptR = 2.8366423 val weightsR = Array(-0.5895848, 0.8931147, -0.3925051, -0.7996864) -assert(model.intercept ~== interceptR relTol 1E-3) -assert(model.weights(0) ~== weightsR(0) relTol 1E-3) -assert(model.weights(1) ~== weightsR(1) relTol 1E-3) -assert(model.weights(2) ~== weightsR(2) relTol 1E-3) -assert(model.weights(3) ~== weightsR(3) relTol 1E-3) +assert(model1.intercept ~== interceptR relTol 1E-3) +assert(model1.weights(0) ~== weightsR(0) relTol 1E-3) --- End diff -- BTW, this will be tricky with L1 since some of the coefficients can be zero. We can workaround this with `absTol`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119735922 @mengxr Let's merge it first, and I will find someone to work on this this weekend. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119724767 @dbtsai LGTM except that we could simplify the test code by a lot. We can either merge it and address it in a follow-up PR (and one for the model formulation), or continue improving this one. Which do you prefer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r34195242 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -232,16 +236,26 @@ class LogisticRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { val interceptR = 2.8366423 val weightsR = Array(-0.5895848, 0.8931147, -0.3925051, -0.7996864) -assert(model.intercept ~== interceptR relTol 1E-3) -assert(model.weights(0) ~== weightsR(0) relTol 1E-3) -assert(model.weights(1) ~== weightsR(1) relTol 1E-3) -assert(model.weights(2) ~== weightsR(2) relTol 1E-3) -assert(model.weights(3) ~== weightsR(3) relTol 1E-3) +assert(model1.intercept ~== interceptR relTol 1E-3) +assert(model1.weights(0) ~== weightsR(0) relTol 1E-3) --- End diff -- For the tests, it might be simpler to make `weightR` a `DenseVector` then use assert(model1.weights ~== weightsR relTol ...)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119443905 [Test build #36754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36754/console) for PR 7080 at commit [`877e6c7`](https://github.com/apache/spark/commit/877e6c777874c3684c62b94a8704c8a1866defd9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119444002 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119411086 [Test build #36754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36754/consoleFull) for PR 7080 at commit [`877e6c7`](https://github.com/apache/spark/commit/877e6c777874c3684c62b94a8704c8a1866defd9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119411025 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119411012 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-119410885 BTW: Will have more documentation in the followup PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-118171388 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-118171249 [Test build #36424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36424/console) for PR 7080 at commit [`ed4e14a`](https://github.com/apache/spark/commit/ed4e14a6708f4384408e572884fc78244caeb0f2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-118134622 [Test build #36424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36424/consoleFull) for PR 7080 at commit [`ed4e14a`](https://github.com/apache/spark/commit/ed4e14a6708f4384408e572884fc78244caeb0f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-118134102 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-118134008 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33803828 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String) def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value) setDefault(fitIntercept -> true) + /** + * Whether to standardize the training features prior to fitting the model sequence. --- End diff -- @dbtsai There may be an issue if we copy some code/text from R, which is GPL-licensed. Let's rephrase this sentence. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117988343 [Test build #36366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36366/console) for PR 7080 at commit [`a291cb6`](https://github.com/apache/spark/commit/a291cb679e3286f6486083ef9ab9666144d58e8a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117988519 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117948341 [Test build #36366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36366/consoleFull) for PR 7080 at commit [`a291cb6`](https://github.com/apache/spark/commit/a291cb679e3286f6486083ef9ab9666144d58e8a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117947974 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117948001 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33752237 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -292,20 +317,51 @@ class LogisticRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { * data.V4 -0.04325749 * data.V5 -0.02481551 */ -val interceptR = -0.05627428 -val weightsR = Array(0.0, 0.0, -0.04325749, -0.02481551) - -assert(model.intercept ~== interceptR relTol 1E-2) -assert(model.weights(0) ~== weightsR(0) relTol 1E-3) -assert(model.weights(1) ~== weightsR(1) relTol 1E-3) -assert(model.weights(2) ~== weightsR(2) relTol 1E-2) -assert(model.weights(3) ~== weightsR(3) relTol 2E-2) +val interceptR1 = -0.05627428 +val weightsR1 = Array(0.0, 0.0, -0.04325749, -0.02481551) + +assert(model1.intercept ~== interceptR1 relTol 1E-2) +assert(model1.weights(0) ~== weightsR1(0) absTol 1E-3) +assert(model1.weights(1) ~== weightsR1(1) absTol 1E-3) +assert(model1.weights(2) ~== weightsR1(2) relTol 1E-2) +assert(model1.weights(3) ~== weightsR1(3) relTol 2E-2) + +/** + * Using the following R code to load the data and train the model using glmnet package. --- End diff -- well. this is big conflict. will update soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33749927 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String) def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value) setDefault(fitIntercept -> true) + /** + * Whether to standardize the training features prior to fitting the model sequence. --- End diff -- This is copied from R's description. I think it's about fitting a sequence of models with different regularization. I will modify it to `to fitting the model`. ```R standardize Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian". ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33739057 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -534,27 +556,41 @@ private class LogisticCostFun( case (aggregator1, aggregator2) => aggregator1.merge(aggregator2) }) -// regVal is the sum of weight squares for L2 regularization -val norm = if (regParamL2 == 0.0) { - 0.0 -} else if (fitIntercept) { - brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size -1)).toBreeze, 2.0) -} else { - brzNorm(weights, 2.0) -} -val regVal = 0.5 * regParamL2 * norm * norm +val totalGradientArray = logisticAggregator.gradient.toArray -val loss = logisticAggregator.loss + regVal -val gradient = logisticAggregator.gradient - -if (fitIntercept) { - val wArray = w.toArray.clone() - wArray(wArray.length - 1) = 0.0 - axpy(regParamL2, Vectors.dense(wArray), gradient) +// regVal is the sum of weight squares excluding intercept for L2 regularization. +val regVal = if (regParamL2 == 0.0) { + 0.0 } else { - axpy(regParamL2, w, gradient) + var sum = 0.0 + w.foreachActive { (index, value) => +// If `fitIntercept` is true, the last term which is intercept doesn't +// contribute to the regularization. +if (index != numFeatures) { + // The following code will compute the loss of the regularization; also + // the gradient of the regularization, and add back to totalGradientArray. + sum += { +if (standardization) { + totalGradientArray(index) += regParamL2 * value + value * value +} else { + if (featuresStd(index) != 0.0) { +// If `standardization` is false, we still standardize the data +// to improve the rate of convergence; as a result, we have to +// perform this reverse standardization by penalizing each component +// differently to get effectively the same objective function when +// the training dataset is not standardized. --- End diff -- We should check R's implementation and discuss whether this is what users expect if they set `standardization` to false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33739015 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -292,20 +317,51 @@ class LogisticRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { * data.V4 -0.04325749 * data.V5 -0.02481551 */ -val interceptR = -0.05627428 -val weightsR = Array(0.0, 0.0, -0.04325749, -0.02481551) - -assert(model.intercept ~== interceptR relTol 1E-2) -assert(model.weights(0) ~== weightsR(0) relTol 1E-3) -assert(model.weights(1) ~== weightsR(1) relTol 1E-3) -assert(model.weights(2) ~== weightsR(2) relTol 1E-2) -assert(model.weights(3) ~== weightsR(3) relTol 2E-2) +val interceptR1 = -0.05627428 +val weightsR1 = Array(0.0, 0.0, -0.04325749, -0.02481551) + +assert(model1.intercept ~== interceptR1 relTol 1E-2) +assert(model1.weights(0) ~== weightsR1(0) absTol 1E-3) +assert(model1.weights(1) ~== weightsR1(1) absTol 1E-3) +assert(model1.weights(2) ~== weightsR1(2) relTol 1E-2) +assert(model1.weights(3) ~== weightsR1(3) relTol 2E-2) + +/** + * Using the following R code to load the data and train the model using glmnet package. --- End diff -- We recently updated the R comment to make it easier to copy & paste. Could you remove the leading `*` and `>` from the comments? Also use `/*` instead of `/**`. The latter is for JavaDoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33739010 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String) def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value) setDefault(fitIntercept -> true) + /** + * Whether to standardize the training features prior to fitting the model sequence. --- End diff -- What do you mean by `model sequence`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33739011 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -149,15 +158,28 @@ class LogisticRegression(override val uid: String) val regParamL1 = $(elasticNetParam) * $(regParam) val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam) -val costFun = new LogisticCostFun(instances, numClasses, $(fitIntercept), +val costFun = new LogisticCostFun(instances, numClasses, $(fitIntercept), $(standardization), featuresStd, featuresMean, regParamL2) val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 0.0) { new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol)) } else { - // Remove the L1 penalization on the intercept def regParamL1Fun = (index: Int) => { -if (index == numFeatures) 0.0 else regParamL1 +// Remove the L1 penalization on the intercept +if (index == numFeatures) 0.0 else { --- End diff -- We either put `if ... else ...` in a single line or use `{..}` for both. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117859864 nm, @mengxr is taking a look instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33738207 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String) def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value) setDefault(fitIntercept -> true) + /** + * Whether to standardize the training features prior to fitting the model sequence. --- End diff -- "model sequence" may not be understood since we don't provide a sequence currently; how about just saying "model?" Also, to make it clear what is meant by standardizing, how about having a link to StandardScaler.withStd? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117859275 I'll take a look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117076337 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117075566 [Test build #36113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36113/console) for PR 7080 at commit [`28c807f`](https://github.com/apache/spark/commit/28c807fd5af7abe5220036ccc3310c5d219a3744). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117063017 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117062924 [Test build #36110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36110/console) for PR 7080 at commit [`8d2d6b7`](https://github.com/apache/spark/commit/8d2d6b796e48721fc1b315bc1b9f1de61d6de714). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117033423 [Test build #36113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36113/consoleFull) for PR 7080 at commit [`28c807f`](https://github.com/apache/spark/commit/28c807fd5af7abe5220036ccc3310c5d219a3744). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117031969 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117031902 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117023138 [Test build #36110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36110/consoleFull) for PR 7080 at commit [`8d2d6b7`](https://github.com/apache/spark/commit/8d2d6b796e48721fc1b315bc1b9f1de61d6de714). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117022970 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117022982 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117016916 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117016901 [Test build #36103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36103/console) for PR 7080 at commit [`cf2569b`](https://github.com/apache/spark/commit/cf2569b7636ee6984b94944262772c694cc6847b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117015646 [Test build #36103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36103/consoleFull) for PR 7080 at commit [`cf2569b`](https://github.com/apache/spark/commit/cf2569b7636ee6984b94944262772c694cc6847b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117015057 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117015008 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-117001396 Jenkins, please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116955865 Jenkins, please test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116955166 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116952789 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116952772 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33532181 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -534,27 +554,39 @@ private class LogisticCostFun( case (aggregator1, aggregator2) => aggregator1.merge(aggregator2) }) -// regVal is the sum of weight squares for L2 regularization -val norm = if (regParamL2 == 0.0) { - 0.0 -} else if (fitIntercept) { - brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size -1)).toBreeze, 2.0) -} else { - brzNorm(weights, 2.0) -} -val regVal = 0.5 * regParamL2 * norm * norm +val totalGradientArray = logisticAggregator.gradient.toArray -val loss = logisticAggregator.loss + regVal -val gradient = logisticAggregator.gradient - -if (fitIntercept) { - val wArray = w.toArray.clone() - wArray(wArray.length - 1) = 0.0 - axpy(regParamL2, Vectors.dense(wArray), gradient) +// regVal is the sum of weight squares excluding intercept for L2 regularization. +val regVal = if (regParamL2 == 0.0) { + 0.0 } else { - axpy(regParamL2, w, gradient) --- End diff -- Makes sense, thanks :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33532097 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -208,8 +208,11 @@ class LogisticRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { } test("binary logistic regression with intercept without regularization") { -val trainer = (new LogisticRegression).setFitIntercept(true) -val model = trainer.fit(binaryDataset) +val trainer1 = (new LogisticRegression).setFitIntercept(true) --- End diff -- Make sense, and I will add another test testing `standardization == true` is default one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33532011 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -534,27 +554,39 @@ private class LogisticCostFun( case (aggregator1, aggregator2) => aggregator1.merge(aggregator2) }) -// regVal is the sum of weight squares for L2 regularization -val norm = if (regParamL2 == 0.0) { - 0.0 -} else if (fitIntercept) { - brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size -1)).toBreeze, 2.0) -} else { - brzNorm(weights, 2.0) -} -val regVal = 0.5 * regParamL2 * norm * norm +val totalGradientArray = logisticAggregator.gradient.toArray -val loss = logisticAggregator.loss + regVal -val gradient = logisticAggregator.gradient - -if (fitIntercept) { - val wArray = w.toArray.clone() - wArray(wArray.length - 1) = 0.0 - axpy(regParamL2, Vectors.dense(wArray), gradient) +// regVal is the sum of weight squares excluding intercept for L2 regularization. +val regVal = if (regParamL2 == 0.0) { + 0.0 } else { - axpy(regParamL2, w, gradient) --- End diff -- I think it will be even faster than the previous one since I don't have a copy of wArray anymore, and I compute the gradient and norm in the same pass while previously, we looped through the array twice. Since it's axpy, the BLAS will not gain too much performance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116850826 Looks good to me with a few minor notes :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33505545 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -534,27 +554,39 @@ private class LogisticCostFun( case (aggregator1, aggregator2) => aggregator1.merge(aggregator2) }) -// regVal is the sum of weight squares for L2 regularization -val norm = if (regParamL2 == 0.0) { - 0.0 -} else if (fitIntercept) { - brzNorm(Vectors.dense(weights.toArray.slice(0, weights.size -1)).toBreeze, 2.0) -} else { - brzNorm(weights, 2.0) -} -val regVal = 0.5 * regParamL2 * norm * norm +val totalGradientArray = logisticAggregator.gradient.toArray -val loss = logisticAggregator.loss + regVal -val gradient = logisticAggregator.gradient - -if (fitIntercept) { - val wArray = w.toArray.clone() - wArray(wArray.length - 1) = 0.0 - axpy(regParamL2, Vectors.dense(wArray), gradient) +// regVal is the sum of weight squares excluding intercept for L2 regularization. +val regVal = if (regParamL2 == 0.0) { + 0.0 } else { - axpy(regParamL2, w, gradient) --- End diff -- Should we keep the previous code for computing the norm/regVal if we are performing standardization? Since the bzNorm, axpy, etc. functions are part of BLAS they are probably faster than the loop we have? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33505552 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -208,8 +208,11 @@ class LogisticRegressionSuite extends SparkFunSuite with MLlibTestSparkContext { } test("binary logistic regression with intercept without regularization") { -val trainer = (new LogisticRegression).setFitIntercept(true) -val model = trainer.fit(binaryDataset) +val trainer1 = (new LogisticRegression).setFitIntercept(true) --- End diff -- For consistency with how the tests for setFitIntercept were done, maybe have setStandardization(true) here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/7080#discussion_r33505540 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -149,15 +158,26 @@ class LogisticRegression(override val uid: String) val regParamL1 = $(elasticNetParam) * $(regParam) val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam) -val costFun = new LogisticCostFun(instances, numClasses, $(fitIntercept), +val costFun = new LogisticCostFun(instances, numClasses, $(fitIntercept), $(standardization), featuresStd, featuresMean, regParamL2) val optimizer = if ($(elasticNetParam) == 0.0 || $(regParam) == 0.0) { new BreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol)) } else { - // Remove the L1 penalization on the intercept def regParamL1Fun = (index: Int) => { -if (index == numFeatures) 0.0 else regParamL1 +// Remove the L1 penalization on the intercept +if (index == numFeatures) 0.0 else { + if ($(standardization)) { +regParamL1 + } else { +if (featuresStd(index) != 0.0) { + // If `standardization` is false, each component is penalized differently to --- End diff -- Maybe mention that the data is standardized, which is why we are performing this reverse standardization for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116638244 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116638099 [Test build #35976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35976/console) for PR 7080 at commit [`588c75f`](https://github.com/apache/spark/commit/588c75f714372b6da4dd20fa7d006afe399fa8e2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116591832 [Test build #35976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35976/consoleFull) for PR 7080 at commit [`588c75f`](https://github.com/apache/spark/commit/588c75f714372b6da4dd20fa7d006afe399fa8e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116590952 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7080#issuecomment-116515950 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/7080 [SPARK-8700][ML] Disable feature scaling in Logistic Regression All compressed sensing applications, and some of the regression use-cases will have better result by turning the feature scaling off. However, if we implement this naively by training the dataset without doing any standardization, the rate of convergency will not be good. This can be implemented by still standardizing the training dataset but we penalize each component differently to get effectively the same objective function but a better numerical problem. As a result, for those columns with high variances, they will be penalized less, and vice versa. Without this, since all the features are standardized, so they will be penalized the same. In R, there is an option for this. `standardize` Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian". +cc @holdenk @mengxr @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai/spark lors Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7080.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7080 commit 588c75f714372b6da4dd20fa7d006afe399fa8e2 Author: DB Tsai Date: 2015-06-24T01:06:03Z first commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org