[ https://issues.apache.org/jira/browse/SPARK-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DB Tsai updated SPARK-10597: ---------------------------- Target Version/s: (was: 1.6.0) > MultivariateOnlineSummarizer for weighted instances > --------------------------------------------------- > > Key: SPARK-10597 > URL: https://issues.apache.org/jira/browse/SPARK-10597 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.5.0 > Reporter: DB Tsai > Assignee: DB Tsai > > MultivariateOnlineSummarizer for weighted instances is implemented as private > API for SPARK-7685. > In SPARK-7685, the online numerical stable version of unbiased estimation of > variance defined by the reliability weights: > [[https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights]] > is implemented, but we would like to make it as public api since there are > different use-cases. > Currently, `count` will return the actual number of instances, and ignores > instance weights, but `numNonzeros` will return the weighted # of nonzeros. > We need to decide the behavior of them before making it public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org