[ 
https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199811#comment-15199811
 ] 

Ohad Raviv commented on SPARK-13313:
------------------------------------

Hi,
I am trying to use graphx's SCC and was very concerned with this issue, so I 
have taken this dataset and ran it with python's networkx 
strongly_connected_components function and got exactly the same results of 519 
SCCs with maximal size = 4051.
So although I don't know what is the real result, the fact that both algorithms 
agree make me believe that they are correct.
I have also looked at the code and it looks fine to me, I don't agree that you 
should change the edge direction on line 89.

> Strongly connected components doesn't find all strongly connected components
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-13313
>                 URL: https://issues.apache.org/jira/browse/SPARK-13313
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.6.0
>            Reporter: Petar Zecevic
>
> Strongly connected components algorithm doesn't find all strongly connected 
> components. I was using Wikispeedia dataset 
> (http://snap.stanford.edu/data/wikispeedia.html) and the algorithm found 519 
> SCCs and one of them had 4051 vertices, which in reality don't have any edges 
> between them. 
> I think the problem could be on line 89 of StronglyConnectedComponents.scala 
> file where EdgeDirection.In should be changed to EdgeDirection.Out. I believe 
> the second Pregel call should use Out edge direction, the same as the first 
> call because the direction is reversed in the provided sendMsg function 
> (message is sent to source vertex and not destination vertex).
> If that is changed (line 89), the algorithm starts finding much more SCCs, 
> but eventually stack overflow exception occurs. I believe graph objects that 
> are changed through iterations should not be cached, but checkpointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to