wesolows created SPARK-16478: -------------------------------- Summary: strongly connected components doesn't cache returned RDD Key: SPARK-16478 URL: https://issues.apache.org/jira/browse/SPARK-16478 Project: Spark Issue Type: Bug Components: GraphX Affects Versions: 1.6.2 Reporter: wesolows
Strongly Connected Components algorithm caches intermediary RDD's but doesn't cache the one that is going to be returned. With large enough graph comparing to available memory when one tries to take action on returned RDD whole RDD has to be computed from scratch which takes much more time than StronglyConnectedComponents alone . I managed to replicate the issue on databrics platform. [Here|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html] is notebook. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org