[ https://issues.apache.org/jira/browse/SPARK-7594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-7594. ------------------------------ Resolution: Invalid Please ask questions at user@ https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark I think the issues is that the resulting Gramian somewhere will then have more than 2^32 entries in an internal array. At this scale you'd be passing around arrays of tens of gigabytes, which probably is well beyond what's practical for this implementation. > Increase maximum amount of columns for covariance matrix for principal > components > --------------------------------------------------------------------------------- > > Key: SPARK-7594 > URL: https://issues.apache.org/jira/browse/SPARK-7594 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Sebastian Alfers > Priority: Minor > > In order to compute a huge dataset, the amount of columns to calculate the > covariance matrix is limited: > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L129 > What is the reason behind this limitation and can it be extended? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org