[ 
https://issues.apache.org/jira/browse/SPARK-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-2979:
---------------------------------

    Assignee: DB Tsai

> Improve the convergence rate by minimizing the condition number in LOR with 
> LBFGS
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-2979
>                 URL: https://issues.apache.org/jira/browse/SPARK-2979
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: DB Tsai
>            Assignee: DB Tsai
>
> Scaling to minimize the condition number:
>     
> During the optimization process, the convergence (rate) depends on the 
> condition number of the training dataset. Scaling the variables often reduces 
> this condition number, thus mproving the convergence rate dramatically. 
> Without reducing the condition number, some training datasets mixing the 
> columns with different scales may not be able to converge.
>      
> GLMNET and LIBSVM packages perform the scaling to reduce the condition 
> number, and return the weights in the original scale.
> See page 9 in http://cran.r-project.org/web/packages/glmnet/glmnet.pdf
>      
> Here, if useFeatureScaling is enabled, we will standardize the training 
> features by dividing the variance of each column (without subtracting the 
> mean), and train the model in the scaled space. Then we transform the 
> coefficients from the scaled space to the original scale as GLMNET and LIBSVM 
> do.
>    
> Currently, it's only enabled in LogisticRegressionWithLBFGS



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to