[ https://issues.apache.org/jira/browse/SPARK-36420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-36420: --------------------------------- Fix Version/s: (was: 3.3.0) > Use `isEmpty` to improve performance in Pregel's superstep > ---------------------------------------------------------- > > Key: SPARK-36420 > URL: https://issues.apache.org/jira/browse/SPARK-36420 > Project: Spark > Issue Type: Improvement > Components: GraphX > Affects Versions: 2.4.7 > Reporter: xiepengjie > Priority: Minor > > When I was running `Graphx.connectedComponents` with 20+ billion vertices and > edges, I found that count is very slow. > {code:java} > object Pregel extends Logging { > ... > def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (...): Graph[VD, ED] = { > ... > // Maybe messages.isEmpty() is better than messages.count() > var activeMessages = messages.count() > // Loop > var prevG: Graph[VD, ED] = null > var i = 0 > while (activeMessages > 0 && i < maxIterations) { > ... > activeMessages = messages.count() > ... > } > ... > g > } // end of apply > } // end of class Pregel > {code} > Maybe we only need an action operator here and active-messages are not empty, > so we don’t need to use count, it’s better to use isEmpty. I verified it and > it worked very well. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org