Weichen Xu created SPARK-17390: ---------------------------------- Summary: optimize MultivariantOnlineSummerizer by making the summarized target configurable Key: SPARK-17390 URL: https://issues.apache.org/jira/browse/SPARK-17390 Project: Spark Issue Type: Improvement Components: ML, MLlib Reporter: Weichen Xu
optimize MultivariantOnlineSummerizer by making the summarized target configurable. for example, if we only need to summarize `mean` and `variance` we only need to accumulate the following vectors. currMean, weightSum, currM2n. so that we can avoid useless computation and serialization, especially when we use MultivariantOnlineSummerizer in RDD.aggregate, when the data dimemsion is large, the extra serialization cost will be large. because MultivariantOnlineSummerizer can be used widely, it is worth to do this optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org