On 2/23/07, William K. Josephson <[EMAIL PROTECTED]> wrote:
On Fri, Feb 23, 2007 at 01:27:56AM -0500, Joel Salomon wrote:
> Would such a project be a worthwhile spent of time?  (Might it develop
> into the asteroid to kill the dinosaur waiting for it?)

Why go to the trouble?  For C, the lexer is easy
enough to just write by hand.

For a useful and significant subset of C, the lexer is easy enough to
just write by hand.  I was trying for full C99 (what were those ISO
guys drinking?).  I spent far too much time on it to call the task
"easy".

I have what I believe is a pretty complete C lexer
(http://www.tip9ug.jp/who/chesky/comp/lex.c).  It still is far from
being integrated into a full grammar, but it scans cpp(1) output
nicely.  I tested it against some of the odder "features" of C99—UCNs,
hex floats, &c.—and it seems to work.

Some parts were easy, some less so, and some looked easy until they
turned out to be subtly wrong.  Recognizing whether the number seen is
an integer (in decimal, octal, or hex) or a real number was one of the
hard parts, and one I gladly handed off to a regexp.  The way I
generated the regexp may not be ideal, as someone pointed out to me
off-list, but hand-generated code that recognizes what sort of number
was seen would be exactly equivalent to the regexp, and less readable.

--Joel

Reply via email to