GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/7875
[SPARK-8601][ML] Add an option to disable standardization for linear regression All compressed sensing applications, and some of the regression use-cases will have better result by turning the feature scaling off. However, if we implement this naively by training the dataset without doing any standardization, the rate of convergency will not be good. This can be implemented by still standardizing the training dataset but we penalize each component differently to get effectively the same objective function but a better numerical problem. As a result, for those columns with high variances, they will be penalized less, and vice versa. Without this, since all the features are standardized, so they will be penalized the same. In R, there is an option for this. standardize Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with family="gaussian". Note that the primary author for this PR is @holdenk You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai/spark SPARK-8522 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/7875.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7875 ---- commit 00a1dc5c7550bccd481b264451251bcb5dbde4e6 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-25T01:31:08Z Add the param to the linearregression impl commit 55d3a66857220631244bd0a5001ab8c6a864b7c0 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-26T04:02:19Z Add standardization param for linear regression commit e47c57475819496fdd7cbcda9ddb7d7c0f58f538 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-26T05:27:30Z Add support for L2 without standardization. commit e54a8a98e1dc0b16a4f03fd7ad0da92f8b6b66aa Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-26T19:13:23Z Fix long line commit 99ce053603aab8c379b372d00d8b7b586c655de3 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-30T18:16:22Z merge in master commit 0c334a256cb8004c968e9a3f9360c34d33e39a8f Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-30T18:18:33Z Remove extra line commit b83a41e13d87864c866e996eb95e74454256cfce Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-06-30T19:11:41Z Expand the tests and make them similar to the other PR also providing an option to disable standardization (but for LoR). commit 3f929358579da340010e2d5f8a86eaf4a1f9a994 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-07-10T00:04:42Z merge commit eebe10a8c1eb9da6ab313c0deb38207e3c2f5fa6 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-07-10T00:07:09Z Use same comparision operator throughout the test commit 332f14027ce5f81774a4f3b02b808dad2e1edc75 Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-07-21T06:30:59Z Merge in master commit 6b1dc09c20cb6588e3eff2ba036d2649b8b81d8d Author: Holden Karau <hol...@pigscanfly.ca> Date: 2015-07-21T07:24:32Z Merge branch 'master' into SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression commit d6234ba61e020dc9c3ff314772cb3dd98c1be5dd Author: DB Tsai <d...@netflix.com> Date: 2015-08-02T21:53:47Z Merge branch 'master' into SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org