Why not just sort the text file and then build the merged trees
directly, without the numerous intermediate trees?

On Wed, Nov 3, 2010 at 12:22 PM, Paul Ingles <p...@forward.co.uk> wrote:
> Hi,
>
> I've been playing around with breaking apart a list of postal codes to
> be stored in a tree with leaf nodes containing information about that
> area. I have some code that works with medium-ish size inputs but
> fails with a GC Overhead error with larger input sets (1.5m rows) and
> would really appreciate anyone being able to point me in the right
> direction.
>
> The full code is up as a gist here: https://gist.github.com/661278
>
> My input file contains something like:
>
> SW1A 1,10
> SW1A 2,20
> ...
>
> Which are then mapped to 2 trees:
>
> {"S" {"W" {"1" {"A" {"1" 10}}}}}
> {"S" {"W" {"1" {"A" {"2" 20}}}}}
>
> I then want to continually fold those trees into a master tree. For
> the 2 maps above the merged tree would be:
>
> {"S" {"W" {"1" {"A" {"1" 10 "2" 20}}}}}
>
> I'm sure I'm missing all kinds of awesome core/contrib functions to
> make it more concise and would appreciate anyone pointing out
> alternatives also.
>
> The main problem is that it fails when my input data gets sufficiently
> large. On my MacBook Pro it falls over with an input set of 1.5m
> records (although a lot of these would be branches from the first few
> levels). It reports GC Overhead limit exceeded, although also ran out
> of heap size before I upped that.
>
> I assume this is because during the tree reduction it's still
> retaining references to nodes eventually causing it to build
> continually larger structures?
>
> I've included the reduce function (and how that gets called to produce
> results) inline:
>
> (defn merge-tree
>  [tree other]
>  (if (not (map? other))
>    tree
>    (merge-with (fn [x y] (merge-tree x y))
>                tree other)))
>
> (def results (reduce merge-tree
>                     {}
>                     (map record-to-tree
>                          postcodes-from-file)))
>
> All help much appreciated,
> Paul
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en



-- 
Cheers,
Leif

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to