[ https://issues.apache.org/jira/browse/SPARK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell resolved SPARK-1328. ------------------------------------ Resolution: Fixed > Current implementation of Standard Deviation in MLUtils may cause > catastrophic cancellation, and loss precision. > ---------------------------------------------------------------------------------------------------------------- > > Key: SPARK-1328 > URL: https://issues.apache.org/jira/browse/SPARK-1328 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 0.9.0 > Reporter: Xusen Yin > Assignee: Xusen Yin > Labels: MLLib,, statistics, vector > Fix For: 1.0.0 > > > Standard Deviation (SD) is used for dataset normalization, which is useful in > the training process of Lasso, etc. Current implementation of SD is using the > second-order expectations equation E^2( x )-E(x^2), which is not a stable > algorithm facing with floating point computing. > Instead of that, the first-order equation performs better. > Moreover, MLutils is not a right place to hold standard statistics methods, > It is more suitable that put it in the VectorRDDFunctions. Some other > affected machine learning algorithms should also be refined. -- This message was sent by Atlassian JIRA (v6.2#6252)