Hello, I'm running an algorithm for graph structural cohesion that requires a depth-first search of subgraphs of a rather large network. The algorithm will necessarily be redundant in the subgraphs it recurses to, so to speed up the process I implemented a check at each subgraph to see if it's been searched already.
This algorithm is very slow, and takes days to complete on a graph with about 200 nodes. It seems that a significant portion of the computation time is spent looking up the current subgraph in the list of searched subgraphs to see if it is redundant, and I'm wondering if there's a faster way to do this. Currently, the list of searched subgraphs is a list (`theseSearchedBranches`) of unnamed numeric vectors. I'm checking against the list using the following command: if(list(v) %in% theseSearchedBranches){ cat(" Branch already searched: passing.\n\n") return(NULL) } v is a sorted numeric, with length anywhere between 3 and 200. Because theseSearchedBranches gets quite long as the algorithm progresses, the %in% command is taking maybe one or two seconds per lookup. So to reiterate, I have a list() that looks something like this: [[1]] [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41 [18]56 72 75 76 85 95 105 110 134 158 159 165 186 [[2]] [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41 [18]56 72 75 76 85 95 105 110 134 147 159 165 186 [[3]] [1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41 [18]50 56 72 75 85 95 105 110 134 147 158 159 165 186 ... and so on for tens of thousands of entries, and I am trying to find some sort of fast equivalent for %in% to search it. I'm also not adding the vectors to the list in any particular order, as I don't think %in% would know how to take advantage of that anyway. Is there a data structure other than list() that I can use that would be faster? Would it be better to just create a hashed env and add empty variables named "0.5.6.11.12..."? I know there are fast lookup algorithms out there that could take advantage of the fact that the items being searched are indiviually ordered numeric vectors, but I can't find any info about R implementations on the mailing lists or help. Is there any data type that implements a b-tree type of lookup scheme? Any help would be greatly appreciated. Thanks, Peter ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.