wesolows created SPARK-16478:
--------------------------------

             Summary: strongly connected components doesn't cache returned RDD
                 Key: SPARK-16478
                 URL: https://issues.apache.org/jira/browse/SPARK-16478
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.6.2
            Reporter: wesolows


Strongly Connected Components algorithm caches intermediary RDD's but doesn't 
cache the one that is going to be returned. With large enough graph comparing 
to available memory when one tries to take action on returned RDD whole RDD has 
to be computed from scratch which takes much more time than 
StronglyConnectedComponents alone . 
I managed to replicate the issue on databrics platform. 
[Here|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4889410027417133/3634650767364730/3117184429335832/latest.html]
 is notebook. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to