Joe
I've not really looked at your code in detail, but this does
occur to me:
| -- deserialize builds a tree from a list of events;
| -- (deserialize . serialize) = id
| --
| -- This is the problematic function.
| --
| deserialize :: [Event a] -> Tree a
| deserialize events = head (fst (build events)) where
| build :: [Event a] -> ([Tree a], [Event a])
| build [] = ([],[])
| build (e:es) = case e of
| Data a ->
| let (siblings, rest) = build es
| in (Tree a [] : siblings, rest)
| StartTag a ->
| let (children,es') = build es
| (siblings,rest) = build es'
| in (Tree a children : siblings, rest)
| EndTag -> ([],es)
Consider the Data a branch. It'll be compiled to
let t = build es
siblings = fst t
rest = snd t
in ...
In many implementations, until you evaluate 'rest', you'll
keep the whole of 't'; which in turn contains 'siblings'.
So there's a danger of a leak here.
This is a well known problem. You can force t early by saying
case build es of
(siblings, rest) -> ...
but that may give you a different sort of space leak: you are
trying to do everthing incrementally.
GHC does a standard trick, which is to perform the 'snd'
in the garbage collector, so the original form shouldn't
leak. I don't think Hugs does. (But it will when we release STG
Hugs.)
I may be miles astray here.
Simon