[ https://issues.apache.org/jira/browse/SPARK-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zunwen you closed SPARK-18946. ------------------------------ Resolution: Duplicate > treeAggregate will be low effficiency when aggregate high dimension vectors > in ML algorithm > ------------------------------------------------------------------------------------------- > > Key: SPARK-18946 > URL: https://issues.apache.org/jira/browse/SPARK-18946 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Reporter: zunwen you > Labels: features > > In many machine learning algorithms, we have to treeAggregate large > vectors/arrays due to the large number of features. Unfortunately, the > treeAggregate operation of RDD will be low efficiency when the dimension of > vectors/arrays is bigger than million. Because high dimension of vector/array > always occupy more than 100MB Memory, transferring a 100MB element among > executors is pretty low efficiency in Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org