Weichen Xu created SPARK-17390:
----------------------------------

             Summary: optimize MultivariantOnlineSummerizer by making the 
summarized target configurable
                 Key: SPARK-17390
                 URL: https://issues.apache.org/jira/browse/SPARK-17390
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Weichen Xu


optimize MultivariantOnlineSummerizer by making the summarized target 
configurable.

for example, if we only need to summarize `mean` and `variance`
we only need to accumulate the following vectors.
currMean, weightSum, currM2n.

so that we can avoid useless computation and serialization, especially when we 
use MultivariantOnlineSummerizer in RDD.aggregate, when the data dimemsion is 
large, the extra serialization cost will be large.

because MultivariantOnlineSummerizer can be used widely, it is worth to do this 
optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to