Re: Working with a huge graph - how can I make Clojure performant?

Alan Malloy Thu, 28 Mar 2013 11:38:44 -0700

Have you looked at https://github.com/jordanlewis/data.union-find ? 
Personally, I'd prefer it to Christophe's implementation, since his blog 
post seems to start with "I dislike this algorithm"; I also helped out a 
bit in writing this version.


On Monday, March 11, 2013 10:37:39 AM UTC-7, Balint Erdi wrote:
>
> Hey,
>
>
> I got an assignment to implement an algorithm to calculate strongly 
> connected components in a graph (
> http://en.wikipedia.org/wiki/Kosaraju's_algorithm). The graph is rather 
> big, it has ~900.000 vertices.
>
>
> In its first pass, it needs to do a depth-first search on the graph and 
> calculate the finishing time for each node (the finishing time for a node 
> is a number from 0…V-1 where V is the number of vertices). Potentially 
> several depth-first search need to be launched (done in the finishing-times 
> function below) to discover all components.
>
>
> The version I pasted below is the most performant. It discovers ~600.000 
> vertices (2/3 of all vertices). However, on subsequent 
> dfs-by-finishing-times it becomes extremely slow (exploring a handful of 
> additional nodes/second) and I'm not sure why.
>
>
> Here are a few things I tried to speed up the algorithm:
>
>
> * rewrite to a recursive function (not tail-recursive) and use lazy-seqs
>
> * define dfs-by-finishing-times in finishing-times so that the whole graph 
> (G) does not have to be passed at each call.
>
> * use persistent data structures everywhere instead of transients (I know 
> this would only slow things down but it did not hurt to try)
>
> * use a profiler (VisualVM) to learn where the bottlenecks are. I'm not 
> very proficient with profilers but I could not extract any valuable 
> information
>
>
> Here are the code snippets in question:
>
> (I also pasted it here: 
> https://www.refheap.com/paste/3840cc772cc3a5b1d9c4f1db3 for better 
> readability)
>
> (defn dfs-by-finishing-times
>
>   ([G u]
>
>      (dfs-by-finishing-times G u #{}))
>
>   ([G u explored]
>
>      (loop [[v & vs :as stack] (list u), explored (transient explored), 
> lhalf [], rhalf [],  iter-cnt 0]
>
>          (if (seq stack)
>
>           (let [neighbors (persistent!
>
>                            (reduce
>
>                             (fn [c u] (if (explored u) c (conj! c u)))
>
>                             (transient [])
>
>                             (G v)))]
>
>             (cond
>
>              (explored v) (recur vs explored lhalf rhalf (inc iter-cnt))
>
>              (empty? neighbors) (recur vs (conj! explored v) (conj lhalf 
> v) rhalf (inc iter-cnt))
>
>              :else (recur (reduce (fn [stack e] (cons e stack)) vs 
> neighbors)
>
>                        (conj! explored v)
>
>                        lhalf
>
>                        (cons v rhalf)
>
>                        (inc iter-cnt))))
>
>           (concat lhalf rhalf)))
>
>      ))
>
>
> (defn finishing-times [G vertices]
>
>   "The first pass of Kosaraju's algorithm.
>
>    Scan the transpose graph of G, and mark the finishing time for each.
>
>    G should already be the transposed graph"
>
>   (loop [[u & vs :as stack] (seq vertices)
>
>           explored #{},
>
>           finished []]
>
>      (if (nil? u)
>
>        finished
>
>        (let [path (dfs-by-finishing-times G u explored)
>
>              new-explored (into explored path)]
>
>          (recur (remove new-explored vs)
>
>                 new-explored
>
>                 (into finished path))))))
>
> Do you have any insights into what technique I could use to speed up my 
> algorithm? I'm pretty sure I'm missing a key point but I'm not sure 
> what. Presumably the whole algorithm runs under 10 seconds in C# on the 
> same graph so this is rather embarrassing :)
>
> Appreciate your help,
>
> Balint
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Working with a huge graph - how can I make Clojure performant?

Reply via email to