[ https://issues.apache.org/jira/browse/SPARK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093604#comment-14093604 ]
Apache Spark commented on SPARK-2979: ------------------------------------- User 'dbtsai' has created a pull request for this issue: https://github.com/apache/spark/pull/1897 > Improve the convergence rate by minimize the condition number in LOR with > LBFGS > ------------------------------------------------------------------------------- > > Key: SPARK-2979 > URL: https://issues.apache.org/jira/browse/SPARK-2979 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: DB Tsai > > Scaling to minimize the condition number: > > During the optimization process, the convergence (rate) depends on the > condition number of the training dataset. Scaling the variables often reduces > this condition number, thus mproving the convergence rate dramatically. > Without reducing the condition number, some training datasets mixing the > columns with different scales may not be able to converge. > > GLMNET and LIBSVM packages perform the scaling to reduce the condition > number, and return the weights in the original scale. > See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf > > Here, if useFeatureScaling is enabled, we will standardize the training > features by dividing the variance of each column (without subtracting the > mean), and train the model in the scaled space. Then we transform the > coefficients from the scaled space to the original scale as GLMNET and LIBSVM > do. > > Currently, it's only enabled in LogisticRegressionWithLBFGS -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org