Duncan Coutts wrote:
On Tue, 2007-02-13 at 15:27 -0500, Jefferson Heard wrote:
Hi, I am running the following code against a 210 MB file in an attempt to determine whether I should use alex or whether, since my needs are very performance oriented, I should write a lexer of my own. I thought that everything I'd written here was tail-recursive

Isn't that exactly the problem - that it's tail recursive? You do not
want it to be tail recursive since then it must consume the whole input
before producing any output. You want it to be as lazy as possible so
that it can start producing tokens as soon as possible without having to
consume everything.

Duncan is right, and I will just elaborate a little bit.

Consider the pass1 function:

pass1 :: String -> String -> String pass1 left [] = left
  pass1 left ('<':right) = pass1 left (stripTagOrComment right)
  pass1 left (' ':right) = pass1 left right
  pass1 left (c:right)
      | Set.member c punct = pass1 (' ':c:' ':left) right
      | otherwise          = pass1 (c:left) right

It accumulates its result in the "left" parameter. So it chomps down the "right" string building up a bigger and bigger solution until it reaches the base case, and hands the solution over to the calling function.

The calling function gets nothing back from pass1 until pass1 has processed the whole input. And that accumulated solution in "left" could grow quite big.

A much better approach would be:

pass1 :: String -> String pass1 [] = []
  pass1 ('<':right) = pass1 (stripTagOrComment right)
  pass1 (' ':right) = pass1 right
  pass1 (c:right)
      | Set.member c punct = ' ':c:' ': pass1 right
      | otherwise          = c : pass1 right

This way, pass1 will be producing output as early as possible, which can be consumed earlier by the calling function. Lazy evaluation gives you a kind of co-routining between producers and consumers, but you have to write "good" producers and "good" consumers.

You should also write the pass2 in this style as well. Your memory consumption should drop to something very small.

Cheers,
Bernie.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to