Hi, first, all this would be much-much simpler in R or Python, so you might consider using a high level language.
Second, you can define a hash function on the graph, and then calculate that for each graph. If the hash function is good, then you don't need to compare most pairs of graphs, because their hash will be different if they are different. If their hash is the same, then you need to compare them. This can be done by querying their edge lists, sorting them and then comparing them. An additional trick would be to sort the graphs according to their hash, and then only compare neighboring elements in the list with the same hash. Gabor On Tue, Jan 28, 2014 at 5:29 PM, Timothy Rice <[email protected]>wrote: > Hello, > > I have a question which may be a little similar to the backtracking matrix > question, although I don't think the solution will be the same. > > TL;DR my code has a bottle neck when repetitively calling a function that > checks an igraph_vector_ptr_t for graphs identical to another given graph. > > Overall, the program is intended to count the number of candidate > aggregates of two input trees. The algorithm starts with the edge > intersection, then incrementally add edges from the symmetric difference > whenever this preserves acyclicity. > > My initial implementation didn't check for duplicates so the count > ended up greatly exaggerated. Now that I'm checking for duplicates, each > iteration requires re-checking the vector of previously discovered trees. > The duplicate_check function takes the vector ptr and two matrices. One of > the matrices gives the adjacencies of the new candidate tree. The other > matrix is used in a loop. It holds the adjacencies from the vector of > existing trees. The code is as follows: > > igraph_bool_t duplicate_found(igraph_vector_ptr_t *N, > igraph_matrix_t *adj_g, > igraph_matrix_t *adj_N) { > long int n; > for(n = 0; n < igraph_vector_ptr_size(N); ++n) { > igraph_get_adjacency(VECTOR(*N)[n], adj_N, > IGRAPH_GET_ADJACENCY_UPPER, 0); > if(igraph_matrix_all_e(adj_g, adj_N)) { > return 1; > } > } > return 0; > } > > If the function returns 0, no duplicates were found, so the graph > corresponding to adj_g can be added to the end of the vector N. > > My code runs okay for small input trees, say up to 10 nodes, but as the > powerset of the symmetric difference grows something like exponentially, > things start slowing down a lot after that. When I compile with > "-pg" and run gprof, I see that the biggest bottleneck is the above > duplicate-checking function. Therefore, I'd like to squeeze a bit more > efficiency out of that function if possible. > > It has occurred to me that at the same time as adding new trees to the > vector N, I could also be adding their matrices to another vector; or I > could define a structure that contains both a graph and the graph's > adjacency matrix, and make an igraph_vector_ptr_t that holds instances of > this struct. This would save recomputing *adj_N more than once per tree. > > Another idea was to try using threads to check multiple trees in N > simultaneously. > > Unfortunately, implementing either of these ideas would add a lot more > complexity to my code. I'd rather avoid so much overhead if I can help it. > > So my question is, is there some faster way of scanning the vector of trees > for duplicates that isn't going to add crazy extra complexity to my > program? > To be honest, I'm a little pessimistic that any simple solution exists, but > maybe I've overlooked something due to inexperience with C and igraph. > > > Cheers, > > > Tim Rice > -- > PhD Candidate > A: Room 210, Richard Berry (Maths & Stats), Melbourne University > E: [email protected] > W: http://www.ms.unimelb.edu.au/~trice > G: https://github.com/cryptarch > > _______________________________________________ > igraph-help mailing list > [email protected] > https://lists.nongnu.org/mailman/listinfo/igraph-help > >
_______________________________________________ igraph-help mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/igraph-help
