[ https://issues.apache.org/jira/browse/SPARK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DB Tsai updated SPARK-2979: --------------------------- Summary: Improve the convergence rate by minimizing the condition number in LOR with LBFGS (was: Improve the convergence rate by minimize the condition number in LOR with LBFGS) > Improve the convergence rate by minimizing the condition number in LOR with > LBFGS > --------------------------------------------------------------------------------- > > Key: SPARK-2979 > URL: https://issues.apache.org/jira/browse/SPARK-2979 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: DB Tsai > > Scaling to minimize the condition number: > > During the optimization process, the convergence (rate) depends on the > condition number of the training dataset. Scaling the variables often reduces > this condition number, thus mproving the convergence rate dramatically. > Without reducing the condition number, some training datasets mixing the > columns with different scales may not be able to converge. > > GLMNET and LIBSVM packages perform the scaling to reduce the condition > number, and return the weights in the original scale. > See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf > > Here, if useFeatureScaling is enabled, we will standardize the training > features by dividing the variance of each column (without subtracting the > mean), and train the model in the scaled space. Then we transform the > coefficients from the scaled space to the original scale as GLMNET and LIBSVM > do. > > Currently, it's only enabled in LogisticRegressionWithLBFGS -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org