[ https://issues.apache.org/jira/browse/SPARK-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601669#comment-14601669 ]
Apache Spark commented on SPARK-8522: ------------------------------------- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/7024 > Disable feature scaling in Linear and Logistic Regression > --------------------------------------------------------- > > Key: SPARK-8522 > URL: https://issues.apache.org/jira/browse/SPARK-8522 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: DB Tsai > Assignee: holdenk > > All compressed sensing applications, and some of the regression use-cases > will have better result by turning the feature scaling off. However, if we > implement this naively by training the dataset without doing any > standardization, the rate of convergency will not be good. This can be > implemented by still standardizing the training dataset but we penalize each > component differently to get effectively the same objective function but a > better numerical problem. As a result, for those columns with high variances, > they will be penalized less, and vice versa. Without this, since all the > features are standardized, so they will be penalized the same. > In R, there is an option for this. > `standardize` > Logical flag for x variable standardization, prior to fitting the model > sequence. The coefficients are always returned on the original scale. Default > is standardize=TRUE. If variables are in the same units already, you might > not wish to standardize. See details below for y standardization with > family="gaussian". -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org