I could be missing something, but since you say you're running into
problems as your data gets large, I think you're possibly running into
2 things:

1. You're reading all the data into memory, and keeping it there, in
the form of the lines from the file.
2. The way you're defining postcodes-from-file results in holding on
to the head of the sequence, meaning parts of the sequence that aren't
being used anymore can't be garbage collected. You probably want to
make this into a function rather than a def, so that it gets created
when you invoke the function, but not bound anywhere that causes it to
be a gc root.

Does that make sense?

On Nov 3, 9:22 am, Paul Ingles <p...@forward.co.uk> wrote:
> Hi,
>
> I've been playing around with breaking apart a list of postal codes to
> be stored in a tree with leaf nodes containing information about that
> area. I have some code that works with medium-ish size inputs but
> fails with a GC Overhead error with larger input sets (1.5m rows) and
> would really appreciate anyone being able to point me in the right
> direction.
>
> The full code is up as a gist here:https://gist.github.com/661278
>
> My input file contains something like:
>
> SW1A 1,10
> SW1A 2,20
> ...
>
> Which are then mapped to 2 trees:
>
> {"S" {"W" {"1" {"A" {"1" 10}}}}}
> {"S" {"W" {"1" {"A" {"2" 20}}}}}
>
> I then want to continually fold those trees into a master tree. For
> the 2 maps above the merged tree would be:
>
> {"S" {"W" {"1" {"A" {"1" 10 "2" 20}}}}}
>
> I'm sure I'm missing all kinds of awesome core/contrib functions to
> make it more concise and would appreciate anyone pointing out
> alternatives also.
>
> The main problem is that it fails when my input data gets sufficiently
> large. On my MacBook Pro it falls over with an input set of 1.5m
> records (although a lot of these would be branches from the first few
> levels). It reports GC Overhead limit exceeded, although also ran out
> of heap size before I upped that.
>
> I assume this is because during the tree reduction it's still
> retaining references to nodes eventually causing it to build
> continually larger structures?
>
> I've included the reduce function (and how that gets called to produce
> results) inline:
>
> (defn merge-tree
>   [tree other]
>   (if (not (map? other))
>     tree
>     (merge-with (fn [x y] (merge-tree x y))
>                 tree other)))
>
> (def results (reduce merge-tree
>                      {}
>                      (map record-to-tree
>                           postcodes-from-file)))
>
> All help much appreciated,
> Paul

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to