I thought Limbo Peng wanted to measure the transliteration of some Go algorithm. Wasn't there a remark somewhere that said "there is a faster algorithm based on union-find"?
On Jan 20, 2013, at 7:11 PM, Danny Yoo wrote: >> As the profiler shows, the most time-consuming part lies in the for-loop of >> procedure "qf-connect!", and after a few experiments, I figure it out: it is >> the "in-indexed" procedure that makes it slow. I replaced "in-indexed" with >> separate calls to "in-range" and "in-vector" - the running time gets reduced >> from 4m to 27s. Why? > > Hmmm.. Interesting question. It also seems to be that using: > > (in-naturals) > > instead of: > > (in-range n) > > improves the performance a bit more: on my system, that single change > cuts down the runtime from about 22 seconds to about 15 seconds. > Interesting. So we'd have to take a closer look at the implementation > of in-indexed and in-range to be able to better isolate what's taking > the time. > > > ... wait... looking more closely at the algorithm... isn't this > supposed to be a straightforward union-find? If so, why does there > need to be any vector searching going on here? That is, that step in > qf-connect! that's walking all the elements in the vector. Why in the > world is it doing that? > > ... I don't understand what this is computing yet... reading the code > a few more times... > > ... ok, I think I understand what this is doing now. But it can be > greatly improved. > > The representation that you have chosen for the elements of the set > are bare numbers. However, I think you want to have more structure: > for example, to be able to immediately jump to the representative. > That way, you avoid hunting through every vector element to find the > representative, which is currently the dominating factor of your > program. > > That is, it's certainly useful to optimize in-vector vs. in-indexed > vs. in-naturals, but maybe it's a better idea to avoid iteration > altogether! > > > I already have an implementation of the standard union-find algorithm > on PLaneT that you can use: > > http://planet.racket-lang.org/package-source/dyoo/union-find.plt/1/0/ > > > Here's what the heart of your program should look like with it: > > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > #lang racket/base > > (require racket/list) > (require racket/string) > (require profile > (planet dyoo/union-find/union-find)) > > (module+ main (profile-thunk main)) > > (define (main) > (let* ([n (string->number (read-line))] > [set (make-qf-set n)]) > (for ([line (in-lines)]) > (let* ([pair (string-split line " " #:trim? #f)] > [p (string->number (car pair))] > [q (string->number (cadr pair))]) > (unless (qf-connected? set p q) > (qf-connect! set p q)))) > > ;; TODO: do the output > (printf "Number of components: ~s\n" (qf-count set)) > )) > > (struct qf-set ([count #:mutable] forest) #:transparent) > (define (make-qf-set n) (qf-set n (make-qf-forest n))) > (define (make-qf-forest n) > (define a-forest (make-forest 'equal)) > (for ([i (in-range n)]) (make-set a-forest i)) > a-forest) > > > ;; qf-connected?: qf-set number number -> boolean > (define (qf-connected? set p q) > (= (qf-find set p) > (qf-find set q))) > > > ;; better - it takes about 28s to finish > ;; but still slower than the Golang version > (define (qf-connect! set p q) > (let* ([p-root (qf-find set p)] > [q-root (qf-find set q)] > [forest (qf-set-forest set)]) > (unless (= p-root q-root) > (union-set forest p-root q-root) > (set-qf-set-count! set (sub1 (qf-set-count set)))))) > > > ;; qf-find: qf-set number -> number > (define (qf-find set p) > (define forest (qf-set-forest set)) > (find-set forest p)) > > (define qf-count qf-set-count) > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; > > > > It takes about 200 milliseconds to run to completion on my system. > > ____________________ > Racket Users list: > http://lists.racket-lang.org/users ____________________ Racket Users list: http://lists.racket-lang.org/users

