[ https://issues.apache.org/jira/browse/SPARK-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248825#comment-15248825 ]
Joseph K. Bradley commented on SPARK-14478: ------------------------------------------- Adding a param seems reasonable, though probably pretty low priority. To make a judgement call...how about we leave it as is for now? I'll send a PR to document that it's using unbiased variance. If any user ever needs biased, then we can add the Param (but I've never heard anyone except myself complain). > Should StandardScaler use biased variance to scale? > --------------------------------------------------- > > Key: SPARK-14478 > URL: https://issues.apache.org/jira/browse/SPARK-14478 > Project: Spark > Issue Type: Question > Components: ML, MLlib > Reporter: Joseph K. Bradley > > Currently, MLlib's StandardScaler scales columns using the unbiased standard > deviation. This matches what R's scale package does. > However, it is a bit odd for 2 reasons: > * Optimization/ML algorithms which require scaled columns generally assume > unit variance (for mathematical convenience). That requires using biased > variance. > * scikit-learn, MLlib's GLMs, and R's glmnet package all use biased variance. > *Question*: Should we switch to unbiased? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org