Joe

I've not really looked at your code in detail, but this does
occur to me:

| -- deserialize builds a tree from a list of events;
| -- (deserialize . serialize) = id
| --
| -- This is the problematic function.
| --
| deserialize :: [Event a] -> Tree a
| deserialize events = head (fst (build events)) where
|     build :: [Event a] -> ([Tree a], [Event a])
|     build [] = ([],[])
|     build (e:es) = case e of
|       Data a ->
|           let (siblings, rest) = build es
|           in  (Tree a [] : siblings, rest)
|       StartTag a ->
|           let (children,es')  = build es
|               (siblings,rest) = build es'
|           in (Tree a children : siblings, rest)
|       EndTag -> ([],es)

Consider the Data a branch. It'll be compiled to

        let t = build es
            siblings = fst t
            rest     = snd t
        in ...

In many implementations, until you evaluate 'rest', you'll
keep the whole of 't'; which in turn contains 'siblings'.
So there's a danger of a leak here. 

This is a well known problem.  You can force t early by saying

        case build es of
          (siblings, rest) -> ...

but that may give you a different sort of space leak: you are
trying to do everthing incrementally.

GHC does a standard trick, which is to perform the 'snd'
in the garbage collector, so the original form shouldn't
leak.  I don't think Hugs does.  (But it will when we release STG
Hugs.)


I may be miles astray here.

Simon

Reply via email to