For something simple like the C# preprocessor, it's easy to simply use lexer modes. I'm pretty happy with the method I posted because it's allowed me to integrate multiple lexers into a single "big" lexer almost as easily as it's been to create lexer modes. For example, the following uses 4 lexers (HTML text, HTML tags, PHP code, PHP doc comments):
http://visualstudiogallery.msdn.microsoft.com/2a10ba81-26c5-47d9-939b-6bcc7b bec251/showImage/53676 The following uses 3 lexers (group, template text, template expression): http://visualstudiogallery.msdn.microsoft.com/5ca30e58-96b4-4edf-b95e-3030da f474ff/showImage/53651 Sam From: chris king [mailto:kingce...@gmail.com] Sent: Friday, July 29, 2011 1:42 AM To: Sam Harwell Cc: antlr-interest@antlr.org Subject: Re: Have I found an Antlr CSharp3 lexer bug if... Sam, thanks again for taking the time to put that together for me. I owe you a beer. And if I had a deadline to meet I'd copy paste your code and move on. But what did Terence say in his book? Why spend 5 minutes doing something procedurally when you can spend 5 weeks doing it declaratively? For better or for worse, I think I suffer from the same affliction. :) So please allow me to play Devils Advocate for a second and try and propose a declarative solution using both lexer and parser (and possibly requiring new ANTLR declarations). And finally I'll hop up on my soap box and suggest that such a solution is a worthy goal of ANTLR. Basically the idea is to take what you've done procedurally, generalize it, and expose it declaratively. To do this I'd introducing declarative syntax which would allow lexer rules to be partitioned and then allow those partitions to be toggled on and off based on context known to the parser. Given that, I think the pre-processor could be done without procedural logic. The partitioning syntax could be borrowed from C# custom attribute syntax. So // No declaration so in default partition VERBATIM_STRING_LITERAL : '@"' F_VERBATIM_STRING_LITERAL_CHARACTER* '"' ; [Partition("PP")] // Part of pre-processor partition DEFINE: F_POUND_SIGN 'define'; [Partition("IfDefedOut")] // Only member of if-defed-out partition PP_SKIPPED_CHARACTERS : ( ~(F_NEW_LINE_CHARACTER | F_POUND_SIGN) F_INPUT_CHARACTER* F_NEW_LINE )* ; Putting it together this is how it could solve the verbatim string problem: #if false @" #endif We start by partition our lexer rules into three groups. First (1) would be pre-processor, second (2) C# grammer (including the verbatim string rule), and third (3) would be the rule to "consume all text till the next pragma" rule. After establishing the partitions we need to add logic to toggle them on and off. We'd add that logic to the #if/#elif/#else parser rules. That logic would detect that the #if false expression trivially evaluates to false and would disable the (2) C# partition and activate the third (3) "consume all text till the next pragma" partition thereby deactivating the verbatim rule and allowing all text up to the #endif to be consumed. Basically all this is repeating exactly what you've done procedurally. You already established these groups in your solution. The first (1) is the set of rules in the lexer, the second (2) is implemented separately and handed tokens as they are produced by the pre-processor, and the third (3) is implemented procedurally with the regexs. And you switch to that third (3) partition and disable the second (2) at the point where you intercept the the input stream. So both approaches are basically doing the same thing. Finally, why would ANTLR want do do this? Well, because enabling this scenario is right up ANTLRs design-philosophy-alley! ANTLR makes common compiler building tasks simple. And it seems to me that pre-processors fall into this bucket of common-compiler-problems-that-could-have-simpler-solutions. And that the C# pre-processor is so extremely basic, as you say, makes it ripe as a first goal of a declarative solution. I know pre-processors have always been done in the lexer but I got the feeling reading Terence's book that he was more than happy to take a second look at common compiler problems and solve 'em in different ways by introducing previously unheard of concepts (at least to me): Lexers and parsers in one file. Semantic and syntactic predicates. Infinite look ahead with back tracking. And my favorite example of ANTLR simplifying the declarative syntax for a common parsing problem is DELIMITED_COMMENT: '/*' .* '*/'; Wow. Why wasn't it that easy with lex/yacc?! Maybe the time has come for us to take a look at the idea that the pre-processor needs to be done in the lexer and see if we can simplify that too! So, what do you think? Can it be done? And thanks again for your time and the VS tools! Thanks, Chris List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.