Hi,

I've been playing around with breaking apart a list of postal codes to
be stored in a tree with leaf nodes containing information about that
area. I have some code that works with medium-ish size inputs but
fails with a GC Overhead error with larger input sets (1.5m rows) and
would really appreciate anyone being able to point me in the right
direction.

The full code is up as a gist here: https://gist.github.com/661278

My input file contains something like:

SW1A 1,10
SW1A 2,20
...

Which are then mapped to 2 trees:

{"S" {"W" {"1" {"A" {"1" 10}}}}}
{"S" {"W" {"1" {"A" {"2" 20}}}}}

I then want to continually fold those trees into a master tree. For
the 2 maps above the merged tree would be:

{"S" {"W" {"1" {"A" {"1" 10 "2" 20}}}}}

I'm sure I'm missing all kinds of awesome core/contrib functions to
make it more concise and would appreciate anyone pointing out
alternatives also.

The main problem is that it fails when my input data gets sufficiently
large. On my MacBook Pro it falls over with an input set of 1.5m
records (although a lot of these would be branches from the first few
levels). It reports GC Overhead limit exceeded, although also ran out
of heap size before I upped that.

I assume this is because during the tree reduction it's still
retaining references to nodes eventually causing it to build
continually larger structures?

I've included the reduce function (and how that gets called to produce
results) inline:

(defn merge-tree
  [tree other]
  (if (not (map? other))
    tree
    (merge-with (fn [x y] (merge-tree x y))
                tree other)))

(def results (reduce merge-tree
                     {}
                     (map record-to-tree
                          postcodes-from-file)))

All help much appreciated,
Paul

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to