Hi Tamas,
I'm glad to share with you the good news that I get the issue solved !
I translate my code almost line by line from R to Python and it runs super
fast: it took less than 5 minutes to sample motifs in a graph with over one
million nodes.
However, the R code still runs very slowly, under Windows, Mac and Linux
server environments.
I am curious about the reason behind this phenomenon, so I list the
different parts between two code here:
In Python:
(1) I use neighbors method to check the neighbors:
neighbors_node1 = graph.neighbors( motif_current[0] )
node_selected = neighbors_node1[ idx_NextMotif ]
(2) to check whether the selected node is one of the neighbors of the node
in current motif:
CheckMotif = node_selected in graph.neighbors(
motif_current[1] )
(3) to form a new motif:
motif_next = [ motif_current[0], node_selected ]
In R:
(1) I apply ego to get the neighbors since ego runs faster than neigbhors:
neighbors_motif_current <- ego( g_OriginGraph, 1, motif_current )
neighbors_node1 <- neighbors_motif_current[[1]]
(2) to check whether the selected node is one of the neighbors of the node
in current motif:
node_selected <- neighbors_node1[idx_NextMotif]
CheckMotif <- are_adjacent( g_OriginGraph, node_selected,
motif_current[2] )
(3) to form a new motif:
motif_next <- c( motif_current[1], node_selected )
The above three parts are the only differences between the R and Python
code, so I am wondering if the indexing of R slows down the code, or using
c() to create a vector? This really confuses me a lot.
Thank you!
Best regards,
Phil
___________________________________________________________
Phil Hengjun Cui
Drexel University | Electrical and Computer Engineering
Philadelphia, USA
___________________________________________________________
On Wed, Aug 12, 2015 at 2:44 AM, Tamás Nepusz <[email protected]> wrote:
> Hi,
>
> > Is there a similar function to as_edgelist( graph ) but only providing
> edges incident to the specified vertices in the following way: as_edgelist(
> graph, vertices ).
> incident(graph, vertex) and incident_edges(graph, vertices) does exactly
> that if I remember correctly.
>
> > This is based on my experience that the numeric matrix returned by
> as_edgelist is more convenient and faster to further process than list.
> It could be the case, but I'm not sure whether that's the real bottleneck
> in your code. Have you tried profiling it to see which parts are
> responsible for most of the execution time?
>
> > On the other hand, I'm using python as well. So what is situation on
> the python side?
> They both rely on the same set of core C routines so I don't think there
> would be a big difference - one possible optimization would be to use
> Python's built-in set type for set intersections and exclusions (basically
> you pre-compute the incidence list for each vertex as a Python dict mapping
> vertex indices to sets). Maybe you could also use another dict as a cache
> that memorizes the computed indices for edges incident on a specific edge
> so you don't calculate it over and over again when your random walker is
> stuck in a part of a graph - but the benefits of such a solution depend on
> the structure of your graph
>
> T.
> _______________________________________________
> igraph-help mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/igraph-help
>
_______________________________________________
igraph-help mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/igraph-help