[ https://issues.apache.org/jira/browse/SPARK-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DB Tsai updated SPARK-10597: ---------------------------- Description: MultivariateOnlineSummarizer for weighted instances is implemented as private API for SPARK-7685. In SPARK-7685, the online numerical stable version of unbiased estimation of variance defined by the reliability weights: [[https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights]] is implemented, but we would like to make it as public api since there are different use-cases. Currently, `count` will return the actual number of instances, and ignores instance weights, but `numNonzeros` will return the weighted # of nonzeros. We need to decide the behavior of them before making it public. was: MultivariateOnlineSummarizer for weighted instances is implemented as private API for #SPARK-7685. In #SPARK-7685, the online numerical stable version of unbiased estimation of variance defined by the reliability weights: [[https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights]] is implemented, but we would like to make it as public api since there are different use-cases. Currently, `count` will return the actual number of instances, and ignores instance weights, but `numNonzeros` will return the weighted # of nonzeros. We need to decide the behavior of them before making it public. > MultivariateOnlineSummarizer for weighted instances > --------------------------------------------------- > > Key: SPARK-10597 > URL: https://issues.apache.org/jira/browse/SPARK-10597 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 1.5.0 > Reporter: DB Tsai > > MultivariateOnlineSummarizer for weighted instances is implemented as private > API for SPARK-7685. > In SPARK-7685, the online numerical stable version of unbiased estimation of > variance defined by the reliability weights: > [[https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights]] > is implemented, but we would like to make it as public api since there are > different use-cases. > Currently, `count` will return the actual number of instances, and ignores > instance weights, but `numNonzeros` will return the weighted # of nonzeros. > We need to decide the behavior of them before making it public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org