GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/7875

    [SPARK-8601][ML] Add an option to disable standardization for linear 
regression

    All compressed sensing applications, and some of the regression use-cases 
will have better result by turning the feature scaling off. However, if we 
implement this naively by training the dataset without doing any 
standardization, the rate of convergency will not be good. This can be 
implemented by still standardizing the training dataset but we penalize each 
component differently to get effectively the same objective function but a 
better numerical problem. As a result, for those columns with high variances, 
they will be penalized less, and vice versa. Without this, since all the 
features are standardized, so they will be penalized the same.
    
    In R, there is an option for this.
    standardize
    
    Logical flag for x variable standardization, prior to fitting the model 
sequence. The coefficients are always returned on the original scale. Default 
is standardize=TRUE. If variables are in the same units already, you might not 
wish to standardize. See details below for y standardization with 
family="gaussian".
    
    Note that the primary author for this PR is @holdenk 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dbtsai/spark SPARK-8522

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7875.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7875
    
----
commit 00a1dc5c7550bccd481b264451251bcb5dbde4e6
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-25T01:31:08Z

    Add the param to the linearregression impl

commit 55d3a66857220631244bd0a5001ab8c6a864b7c0
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-26T04:02:19Z

    Add standardization param for linear regression

commit e47c57475819496fdd7cbcda9ddb7d7c0f58f538
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-26T05:27:30Z

    Add support for L2 without standardization.

commit e54a8a98e1dc0b16a4f03fd7ad0da92f8b6b66aa
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-26T19:13:23Z

    Fix long line

commit 99ce053603aab8c379b372d00d8b7b586c655de3
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-30T18:16:22Z

    merge in master

commit 0c334a256cb8004c968e9a3f9360c34d33e39a8f
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-30T18:18:33Z

    Remove extra line

commit b83a41e13d87864c866e996eb95e74454256cfce
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-06-30T19:11:41Z

    Expand the tests and make them similar to the other PR also providing an 
option to disable standardization (but for LoR).

commit 3f929358579da340010e2d5f8a86eaf4a1f9a994
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-07-10T00:04:42Z

    merge

commit eebe10a8c1eb9da6ab313c0deb38207e3c2f5fa6
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-07-10T00:07:09Z

    Use same comparision operator throughout the test

commit 332f14027ce5f81774a4f3b02b808dad2e1edc75
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-07-21T06:30:59Z

    Merge in master

commit 6b1dc09c20cb6588e3eff2ba036d2649b8b81d8d
Author: Holden Karau <hol...@pigscanfly.ca>
Date:   2015-07-21T07:24:32Z

    Merge branch 'master' into 
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression

commit d6234ba61e020dc9c3ff314772cb3dd98c1be5dd
Author: DB Tsai <d...@netflix.com>
Date:   2015-08-02T21:53:47Z

    Merge branch 'master' into 
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to