[ https://issues.apache.org/jira/browse/SPARK-26228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16706667#comment-16706667 ]
shahid edited comment on SPARK-26228 at 12/3/18 5:25 AM: --------------------------------------------------------- Hi [~hibayesian], could you please share the full log of the error, if you have. Thanks (btw 16000*16000*8 < 2^31 -1 ) was (Author: shahid): Hi [~hibayesian], could you please share the full log of the error, if you have. Thanks > OOM issue encountered when computing Gramian matrix > ---------------------------------------------------- > > Key: SPARK-26228 > URL: https://issues.apache.org/jira/browse/SPARK-26228 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 2.3.0 > Reporter: Chen Lin > Priority: Major > > {quote}/** > * Computes the Gramian matrix `A^T A`. > * > * @note This cannot be computed on matrices with more than 65535 columns. > */ > {quote} > As the above annotation of computeGramianMatrix in RowMatrix.scala said, it > supports computing on matrices with no more than 65535 columns. > However, we find that it will throw OOM(Request Array Size Exceeds VM Limit) > when computing on matrices with 16000 columns. > The root casue seems that the TreeAggregate writes a very long buffer array > (16000*16000*8) which exceeds jvm limit(2^31 - 1). > Does RowMatrix really supports computing on matrices with no more than 65535 > columns? > I doubt that computeGramianMatrix has a very serious performance issue. > Do anyone has done some performance expriments before? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org