On Tuesday 13 May 2008, Robert Fischer wrote:
> > The problem, as I understand it, is in writting parsers.  Your standard
> > finite automata based regular expression library or lexical analyzer is
> > based, at it's heart, on a table lookup- you have a 2D array, whose size
> > is the number of input characters times the number of states.  For ASCII
> > input, the number of possible input characters is small- 256 at most.
> > 256 input characters times hundreds of states isn't that big of a table-
> > we're looking at sizes in 10's of K- easily handlable even in the bad
> > old days of 64K segments.  Even going to UTF-16 ups the number of input
> > characters from 256 to 65,536- and now a moderately large state machine
> > (hundreds of states) weighs in at tens of megabytes of table space.
> > And, of course, if you try to handle the entire 31-bit full unicode
> > point space, welcome to really large tables :-).
> >
> > The solution, I think, is to change the implementation of your finite
> > automata to use some data structure smarter than a flat 2D array, but
> > that's me.
>
> Yes.  It is certainly possible to write slow code to solve this problem.

With "slow code" you could have been meaning two things:

1. Table lookup globally replaced by compares-and-jumps. The latter benefit 
from branch prediction and speculative execution. So it's not slow anymore.

2. Table "compression" used, where a few compare-and-jumps remove 
huge "unused" swaths of the table. By "unused" I meant "bomb out with an 
internal error".

I think you're being silly. Stop it.

Cheers, Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Reply via email to