"Andrei Alexandrescu" <seewebsiteforem...@erdani.org> wrote in message news:i9vlep$8a...@digitalmars.com... > On 10/23/10 16:39 CDT, Nick Sabalausky wrote: >> "Andrei Alexandrescu"<seewebsiteforem...@erdani.org> wrote in message >> news:i9v8vq$2gv...@digitalmars.com... >> What's wrong with regexes? That's pretty typical for lexers. > > I mentioned that using regexes is possible but would make it much more > difficult to generate good quality lexers.
I see. Maybe a lexer 2.0 thing. > > Besides, regexen are IMHO quite awkward at expressing certain things that > can be easily parsed by hand, such as comments //[^\n]*\n /\*(.|\*[^/])*\*/ Pretty simple as far as regexes go, and I'm far from a regex expert. Plus there's nothing stopping the use of a vastly improved regex syntax like GOLD uses ( http://www.devincook.com/goldparser/doc/grammars/define-terminals.htm ). In that, the two regexes above would look like: {LineCommentChar} = {Printable} - {LF} LineComment = '//' {LineCommentChar}* {LF} {BlockCommentChar} = {Printable} - [*] {BlockCommentCharNoSlash} = {BlockCommentChar} - [/] BlockComment = '/*' ({BlockCommentChar} | '*' {BlockCommentCharNoSlash})* '*/' And further syntactical improvement is easy to imagine, such as in-line character set creation. > or recursive comments. > Granted, although I think there is precident for regex engines that can handle matched nested pairs just fine.