Have you looked at https://github.com/jordanlewis/data.union-find ? Personally, I'd prefer it to Christophe's implementation, since his blog post seems to start with "I dislike this algorithm"; I also helped out a bit in writing this version.
On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote: > > Hey, > > > I got an assignment to implement an algorithm to calculate strongly > connected components in a graph ( > http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather > big, it has ~900.000 vertices. > > > In its first pass, it needs to do a depth-first search on the graph and > calculate the finishing time for each node (the finishing time for a node > is a number from 0…V-1 where V is the number of vertices). Potentially > several depth-first search need to be launched (done in the finishing-times > function below) to discover all components. > > > The version I pasted below is the most performant. It discovers ~600.000 > vertices (2/3 of all vertices). However, on subsequent > dfs-by-finishing-times it becomes extremely slow (exploring a handful of > additional nodes/second) and I'm not sure why. > > > Here are a few things I tried to speed up the algorithm: > > > * rewrite to a recursive function (not tail-recursive) and use lazy-seqs > > * define dfs-by-finishing-times in finishing-times so that the whole graph > (G) does not have to be passed at each call. > > * use persistent data structures everywhere instead of transients (I know > this would only slow things down but it did not hurt to try) > > * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not > very proficient with profilers but I could not extract any valuable > information > > > Here are the code snippets in question: > > (I also pasted it here: > https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better > readability) > > (defn dfs-by-finishing-times > > ([G u] > > (dfs-by-finishing-times G u #{})) > > ([G u explored] > > (loop [[v & vs :as stack] (list u), explored (transient explored), > lhalf [], rhalf [], iter-cnt 0] > > (if (seq stack) > > (let [neighbors (persistent! > > (reduce > > (fn [c u] (if (explored u) c (conj! c u))) > > (transient []) > > (G v)))] > > (cond > > (explored v) (recur vs explored lhalf rhalf (inc iter-cnt)) > > (empty? neighbors) (recur vs (conj! explored v) (conj lhalf > v) rhalf (inc iter-cnt)) > > :else (recur (reduce (fn [stack e] (cons e stack)) vs > neighbors) > > (conj! explored v) > > lhalf > > (cons v rhalf) > > (inc iter-cnt)))) > > (concat lhalf rhalf))) > > )) > > > (defn finishing-times [G vertices] > > "The first pass of Kosaraju's algorithm. > > Scan the transpose graph of G, and mark the finishing time for each. > > G should already be the transposed graph" > > (loop [[u & vs :as stack] (seq vertices) > > explored #{}, > > finished []] > > (if (nil? u) > > finished > > (let [path (dfs-by-finishing-times G u explored) > > new-explored (into explored path)] > > (recur (remove new-explored vs) > > new-explored > > (into finished path)))))) > > Do you have any insights into what technique I could use to speed up my > algorithm? I'm pretty sure I'm missing a key point but I'm not sure > what. Presumably the whole algorithm runs under 10 seconds in C# on the > same graph so this is rather embarrassing :) > > Appreciate your help, > > Balint > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.