[il-antlr-interest: 30260] Re: [antlr-interest] How to force error recovery?
Greetings, I ran into the same issue, and you probably noticed that, when the lookahead doesn't match a statement, it breaks out of * loop and tries to match EOF. I resorted to calling statement() in a loop to force continuation of parsing regardless of error, instead of calling compilationUnit(). Seems to work well enough. It would be good to know if there is a better to handle this, though. Best, J On 10/4/2010 3:27 PM, Edson Tirelli wrote: Hi all, Look at this simple grammar: grammar testGrammar; options { output=AST; } compilationUnit : statement* EOF ; statement : A^ | B^ C ; A : 'a'; B : 'b'; C : 'c'; WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; Using the above grammar, it will successfully parse an input like: a b c a Now, if the input is: a c a The generated parser will parse a, and will fail at c, as it is not a valid statement. Reading the error recovery chapter on the ANTLR book, I would imagine ANTLR would delete/skip the c token and try to recover, successfully parsing the second a, as that is a valid statement again. But it is not working like this. It is aborting the parsing with an error at c. Question: how do I force it to recover from the error and continue parsing? The actual scenario is that the parser I am working on is used by an IDE environment (eclipse), so we need it to continue parsing and presenting the users with all the errors found in the file, not just the first one. The error recovery seems to work on some rules, but not on the top rule (compilationUnit). Thanks, Edson List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30262] Re: [antlr-interest] How to force error recovery?
Thanks for passing on the wiki link. It's definitely smarter than the crude approach I took, which requires additional tree assembly as well as error recovery adjustment to eat up unexpected tokens. J On 10/4/2010 4:26 PM, Edson Tirelli wrote: Thanks for the suggestion. I just found this: http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery I am trying to check if it works for my case. Otherwise I will try your approach. Edson 2010/10/4 Junkman j...@junkwallah.org: Greetings, I ran into the same issue, and you probably noticed that, when the lookahead doesn't match a statement, it breaks out of * loop and tries to match EOF. I resorted to calling statement() in a loop to force continuation of parsing regardless of error, instead of calling compilationUnit(). Seems to work well enough. It would be good to know if there is a better to handle this, though. Best, J On 10/4/2010 3:27 PM, Edson Tirelli wrote: Hi all, Look at this simple grammar: grammar testGrammar; options { output=AST; } compilationUnit : statement* EOF ; statement : A^ | B^ C ; A : 'a'; B : 'b'; C : 'c'; WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; Using the above grammar, it will successfully parse an input like: a b c a Now, if the input is: a c a The generated parser will parse a, and will fail at c, as it is not a valid statement. Reading the error recovery chapter on the ANTLR book, I would imagine ANTLR would delete/skip the c token and try to recover, successfully parsing the second a, as that is a valid statement again. But it is not working like this. It is aborting the parsing with an error at c. Question: how do I force it to recover from the error and continue parsing? The actual scenario is that the parser I am working on is used by an IDE environment (eclipse), so we need it to continue parsing and presenting the users with all the errors found in the file, not just the first one. The error recovery seems to work on some rules, but not on the top rule (compilationUnit). Thanks, Edson List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 30152] Re: [antlr-interest] Literal - ID clash resolution
I'd guess the common first stab would be to have '=' as a distinct token and elevate ID into a parsing symbol: EQUAL: '='; id: (ID | EQUAL) ; option: optionName=id EQUAL STRING ; It might become bit more interesting if you also need to make optionName optional. But maybe you could rewind the stack a bit, and reconsider if you really want/need '=' as a valid identifier? J On 9/14/2010 10:15 AM, Bill Andersen wrote: Folks I'm having a small problem. Not that I can't solve it myself but it's one of those things for which: a) I'm sure there exists a good stock solution, and b) Google is especially poorly suited to find in a search Here it is. I have rules in the grammar for my DSL that have '=' as a literal appearing them. Like this option : optionName=ID '=' STRING ; The DSL parses a language specification and that specification can define reserved words, one of which (in my test case) is '='. This creates a problem: The DSL grammar must recognize '=' as an instance of identifier (ID - I'm using ANTLRWorks default lexer rule template, slightly modified, for now) but it can't recognize '=' as such because it's already a literal used in the DSL grammar. Can anyone tell me what the best way to deal with this is? If my explanation doesn't make sense (seems mine often don't for some reason) I'll be glad to post the whole grammar, but I don't think that's necessary. .bill List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29778] Re: [antlr-interest] Doubt About rewrite rulse
Hi Victor, Victor Giordano wrote: Hi, i am a newbie. Trying to figure out how to work with AST tree and ... bu if i want to use rewrite rules... how do i thread the repetion EBNF operator like * or +. expr : term (('+'|'-') term)* - term ^(('+'|'-') term)* ; try this: expr: ( term - term ) ( ( '+' | '-' ) term - ^( ( '+' | '-' ) $expr term ) )* ; Not sure if the terms need to be distinguished with labels. The Antlr reference book describes the use of rewrite rule inside subrule in more detail. J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29726] Re: [antlr-interest] Building a tree grammar expression to recognize arithmetic expressions
Hi Alex, Alex Storkey wrote: Hi, it's my first time posting in a mailing list like this so go easy on me if I'm breaking some etiquette or anything :) I'm trying to construct an expression in my tree grammar to recognize an AST of simple mathematical expressions like 1+(-(a-b)) in tree format of (+ 1 (- (- a b))) that is generated by my parser grammar. I've tried a couple of different approaches and I can't figure out where I'm going wrong. Could someone explain what's wrong with the following two expressions: expression :(MINUS^)? term; If I understand you correctly, you are asking about writing tree parser grammar. Does Antlr even compile the grammar (i.e., generate a tree parser) with the above rule? I think the rule must be of the form of rewrite rules. J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29713] Re: [antlr-interest] Need some help with AST creation
Hi Luis, You can try this: tokens { // Semantic tokens FIELD; INDEX; } ... fieldExpr: (atom - atom) ( '.' identifier - ^(FIELD $fieldExpr identifier) | '[' expr ']' - ^(INDEX $fieldExpr expr) )* ; If you need the semantic tokens to have the input stream context data, there is a way to create them out of another token, copying its context data, for example in this case say FIELD to copy the context of '.' and INDEX to '['. The notation for this escapes me for the moment, but I think the info won't be difficult to find in the wiki/documentation on Antlr's website. Hope that helps, Jay Luis Pureza wrote: Hi, I need some help from the ANTLR wizards :) I'm trying to match expressions with field accesses and array indexes. For example: costumers.length costumers[0].address costumers[costumers.length - 1].orders[0].total The following rule seems to work: fieldExpr : atom ('.'^ identifier | ('['^ expr ']'!))*; However, it creates trees with notes annotated with '[', and I'd prefer to have a dummy token like INDEX. For example, costumers[0] now returns ([ (ID costumers) (INT 0)) But I'd like it to return (INDEX (ID costumers) (INT 0)) I tried to create the AST manually with - ^(...), but I ended up nowhere. Maybe I should've tried to refactor the grammar, but that would make it a little less readable, so I didn't do it. How do you suggest I do this? Thank you! List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29718] Re: [antlr-interest] Need some help with AST creation
Jim Idle wrote: You can just do this: ddd: a=TOKEN^ B C D { $a.type = INDEX; } ; Jim Typical C programmer mentality. :-) Best, J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29694] Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node
Gerald Rosenberg wrote: -- Original Message (Wednesday, August 04, 2010 5:21:33 PM) From: Junkman -- Subject: Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node You wrote AST ^( ^( PARENT A ) B ). Can you describe the tree this notates? I can see it as a node sequence, but I don't know what tree structure it is describing. Thanks for the reply. Jay The AST ^( ^( PARENT A? ) B? ) should implement as ( ( PARENT Token.DOWN A? Token.UP ) Token.DOWN B? Token.UP ) but is actually ( PARENT Token.DOWN A? B? Token.UP ) Because parent_a is the root of parent, the parser is (for some reason) not actually generating the middle Token.UP Token.DOWN sequence. It's because the parser generates trees, not node streams. UP and DOWN nodes are marker nodes injected while flattening a tree, and the resulting node stream naturally will contain neither empty DOWN-UP sequence (edges to non-existing node) nor empty UP-DOWN sequence between sibling nodes (duplicate edges). So the parser's tree generation behavior makes sense. What's new to me is that tree parser interprets the rewrite expression differently (e.g., expecting the empty marker node sequences), and I think that is contrary to TDAR's suggestion that tree parser rules, in a large part, can be constructed simply by preserving the rewrite expressions from the parser rules. BTW, I found an open bug issue that may be related: http://www.antlr.org/jira/browse/ANTLR-391 It's reported by and assigned to Terrance, so perhaps he can comment on this? explains why P and PA work, but PB and PAB do not - after matching for A?, the tree parser is looking for UP, but finding B. Not sure why Antlr is doing this - not expected. Changing A and/or B to non-optional does not change this behavior. If, however, you change the parent rule to parent : parent_a B? - ^( M parent_a B? ) ; where M is an imaginary token (and make the corresponding change to the tree grammar), all four patterns will parse and match as expected: AST: ^( M ^(PARENT A? ) B? ) properly implements as ( ( M Token.DOWN PARENT A? Token.UP ) Token.DOWN B? Token.UP ) Yes, but this is addressing a different issue - I want the tree parser to recognize my AST, rather than changing the AST to fit the tree parser. For now, though, I think I get the gist of how tree parser interprets the rewrite expression (differently than parser), so I will have to update my tree parser grammar accordingly, although it's odd that my current (i.e. old) tree parser generates relatively few error messages... Thanks for your help, Jay PS: This list doesn't seem the chattiest of mailing list, but please chime in if I have it wrong above, or if you have other insight on the subject. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29668] Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node
Thanks for the replies, Jim Gerald. Your responses and some more testing suggests the following to me: 1. I cannot nest a tree parser rule (inner rule) in another rule (outer rule), and try to have the outer rule match additional nodes in the subtree matched by the inner rule. 2. Consequently, the set of trees generated by rewrite expression does not necessarily match the set of trees matched by the same rewrite expression in the tree parser. Am I in the ballpark here? Jay List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29671] Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node
Gerald Rosenberg wrote: As best I understand your questions, the answers are no, no, and no . . . Given an input PAB, your given parser will construct an AST ^( ^( PARENT A ) B ) and your given tree grammar will likewise match that. Actually, I don't think I understood you. You wrote AST ^( ^( PARENT A ) B ). Can you describe the tree this notates? I can see it as a node sequence, but I don't know what tree structure it is describing. Thanks for the reply. Jay List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29657] [antlr-interest] Tree parser eats up DOWN node when navigating optional child node
Greetings, I am seeing an interesting behavior in generated tree parsers. This is an example grammar fragment: tree grammar TTreeParser; ... parent: ^(parent_a B?) ; parent_a: ^(PARENT A?) ; The intent is for parent_a to match a PARENT node optionally with the child node A, while parent is to match a PARENT node that can also have child node B as well as child node A. But parent rule throws up recognition exception when fed this tree: ^(PARENT B) The problem is parent_a consumes the DOWN node before B instead of skipping it. The following tree also causes the exception for parent: ^(PARENT A B) In this case, parent_a, after consuming A, expects UP when there is still another sibling node - B. It looks like a discrepancy in the rewrite rule interpretation - when used to produce tree, the rules work as expected/intended. I am looking for insight/suggestion to get the tree parser work as intended. Attached are example grammars and generated code plus test driver to demonstrate the issue I'm having. Thanks for any help, Jay tree grammar TTreeParser; options { tokenVocab=T; ASTLabelType=CommonTree; } parent: ^(parent_a B?) ; parent_a: ^(PARENT A?) ; grammar T; options { output=AST; } PARENT: 'P' ; A: 'A' ; B: 'B' ; parent: parent_a B? - ^(parent_a B?) ; parent_a: PARENT A? - ^(PARENT A?) ; List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29211] [antlr-interest] Antlrworks with Python-like grammar
Hello, I have a parser grammar that relies on a custom TokenStream - quite like the Python grammar posted on the Antlr website that relies on PythonTokenStream.java. I am wondering if there is a way to run/debug the parser in AntlrWorks - it would be nice if I can make use of AntlrWorks' debugger visualization features. Thanks for any help, Jay List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29159] Re: [antlr-interest] Multiple lexer tokens per rule
In case anyone reads this thread again, Antlr wiki has a better example for emitting multiple tokens: http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497 Cheers. Junkman wrote: Ken Williams wrote: On 6/4/10 4:16 PM, Junkman j...@junkwallah.org wrote: The way nextToken() is overriden, it first returns the token matched by the rule, and subsequently any additional queued token before matching a new token in the input stream. Maybe I'm being dense here, but I don't think that's what it's doing: public Token nextToken() { return tokenQueue.isEmpty() ? super.nextToken() : tokenQueue.poll(); } If tokenQueue() is non-empty, it always uses it. On the *next* invocation, when it's empty, it will call super.nextToken(). Think of tokens generated by a single rule invocation as a set. The set is generated in/under super.nextToken(), AFTER the queue has been tested to be empty. Among the tokens in the set, the matching token is returned first, because that's what Lexer.nextToken() (super.nextToken()) returns. If that's still not clear, I suggest you put the generated lexer under a debugger (like Jim suggested in another thread ;-) and trace it from nextToken() - will give you better explanation than my verbiage. Best regards. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29131] Re: [antlr-interest] Parsing whole-line comments?
It's probably better to keep lexer simple - just convert character stream into a token stream - and push contextual constraints like beginning of the line into parsing rules, like this: /* Tokens */ NEWLINE: '\n' ; E: 'E'; C: 'C'; CALL: 'CALL'; // default greediness ensures CALL is matched as CALL instead of C. /* Parsing rules */ stmt : E ... NEWLINE | C ... NEWLINE | CALL ... NEWLINE ; Use stmt as the start symbol for the parser, and you have imposed contextual rules for tokens via defining what are valid stmt's. Christian Convey wrote: That is, beginning of line the letter C zero or more non-end-of-line characters end-of-line My problem is, to my knowledge ANTLR won't let me define tokens that match on the beginning of a line ('^'). Any suggestions? There is no need to match such positions: when you match a certain line (a token that ends with a line break), the next character will be the first in a (new) line. Something like this should do the trick: grammar Test; parse : (Comment | Line)+ EOF ; Comment : 'C' ~('\r' | '\n')* (NewLine | EOF) ; Line : ~'C' ~('\r' | '\n')* (NewLine | EOF) ; fragment NewLine : '\r'? '\n' | '\r' ; Thanks, that may work for my particular language, because I may have no other tokens that begin with a capital letter 'C'. But let me wax hypothetical for a minute. Suppose that in other, non-comment lines, I have need to support another token that begins with a capital C. For example, 'CALL'. So my DSL might have a program like this: C My test E CALL FOO CALL This is a comment because 'C' is in first column. Any suggestions for how to an ANTLR lexeme/grammar should handle this? My impression is that something like Flex, whose token regex's can match the beginning-of-line imaginary character, would just let me do this: CommentToken ::= ^C.*$ CallToken ::= ~(^)CALL List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29133] Re: [antlr-interest] Parsing whole-line comments?
Christian Convey wrote: /* Tokens */ NEWLINE: '\n' ; E: 'E'; C: 'C'; CALL: 'CALL'; // default greediness ensures CALL is matched as CALL instead of C. Thanks, but 'C' can also be the name of a variable, as long as it's not in the first column. So I don't think greediness solves the whole problem. I wonder if this would work better in that case: --- /* Tokens */ NEWLINE: '\n' ; /* Parsing rules */ stmt : 'E' ... NEWLINE | 'C' ... NEWLINE | 'CALL' ... NEWLINE ; --- Nor sure since I don't know how explicitly defined tokens are treated differently from tokens implicitly defined in parsing rules. Alternatively, you can apply semantic predicate to lexer rules like this: C: { $pos == 0 }?= 'C' ; It should only match C at the beginning of the line, but I found (in my noob experiences) semantic predicate can be pretty tricky due to hoisting out business and how it affects prediction DFA construction - I'm sure more experienced hands can tell you better. Good luck. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29115] Re: [antlr-interest] Multiple lexer tokens per rule
Try this to get you started: - @lexer::members { // Queue to hold additional tokens private java.util.QueueToken tokenQueue = new java.util.LinkedListToken(); // Include queue in reset(). public void reset() { super.reset(); tokenQueue.clear(); } // Queued tokens are returned before matching a new token. public Token nextToken() { if (tokenQueue.peek() != null) return tokenQueue.poll(); return super.nextToken(); } } MATCHED_TOKEN: ... { // Add additional tokens to the queue. tokenQueue( new CommonToken( ... ) ); } - MATCHED_TOKEN is returned first, and additional tokens queued by MATCHED_TOKEN's action are returned subsequently before matching new tokens in the input stream. Instantiate the additional token accordingly if you need input stream context - see Lexer.emit(). Ken Williams wrote: On 6/3/10 4:18 PM, Jim Idle j...@temporal-wave.com wrote: Add to an array or collection then get nextToken to remove from the collection. It si slower to do this so it isn't the default way. Yeah, that's what the book says. =) It seems like there are some subtleties involved, though - there's a lot of bookkeeping in nextToken() that looks kind of scary (e.g. the current-line-number stuff, the default-channel stuff, etc.), and if I override it I'm really not confident I'll do it correctly. I'm also unsure how mTokens(), emit(), and nextToken() cooperate with their member variables. I tried this simple-minded implementation, and started getting out-of-bounds exceptions: @lexer::members { ListToken tokBuf = new ArrayListToken(); public Token nextToken() { while (tokBuf.isEmpty()) { emit(); } return tokBuf.remove(0); } public void emit(Token token) { tokBuf.add(token); } } So if someone does have a working example, I'd love to see it! List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29072] Re: [antlr-interest] Comments, EOF, and Debugger
Disclaimer: I'm a noob. :) Taking the newline out of comment seems to work, like this: COMMENT : '#' (~( '\r' | '\n' ))* ; NEWLINE : '\r'? '\n' { // kick it off to the hidden channel // $channel=HIDDEN; // or skip it altogether // skip(); } ; Last line comment terminating in EOF presents no problem. I've seen this pattern for comment in other examples. Don't know how/why debuggerLexer changes the outcome, but I assume you can always trace the generated lexers to see how the different outcomes result. J Nathan Eloe wrote: On Jun 1, 2010, at 1:33 PM, ante...@freemail.hu wrote: 6/1/2010 3:33 PM keltezéssel, Nathan Eloe írta: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello all, I'm working on an AST parser for the Bash language and I've come across the following strange behavior: I'm trying to handle comments, so I used the comments token you can get when you start a new grammar in ANTLRworks. It works. COMMENT : '#' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;} ; The problem arises when the comment is the last thing from the input (i.e, no new line before EOF). Removing the '\n' from the token causes it to freak out when I run the tests, but I can't get it to match comments at the end of file. Leaving that '\n' in lets the code compile, but I still can't match that last case. Here's where the interesting part happens. When I run it through the debugger with the same test case that I use in gunit, the debugger allows the input and parses it correctly (meaning, it ignores it as it should) and correctly generates the expected AST. Does the debugger allow the code to be more robust in its decision making abilities? Or does it do something to the input to allow it to be matched to a token. Thanks for the help! Nathan List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29030] Re: [antlr-interest] greedy subrule option idiom
Here is another variation of the grammar: -- grammar Test; fragment CHAR: '\u'..'#' | '$'..'\u' ; STRING : '##' ( options {greedy=false;} : CHAR )* '##' ; stmt: ( . )+ ; This generates grammar check error just like the one in my previous post (attached at the bottom). The error goes away if I pull the character '#' out of CHAR and inline it into STRING with '|' operator next to CHAR like this: grammar Test; fragment CHAR: '\u'..'' | '$'..'\u' ; STRING : '##' ( options {greedy=false;} : CHAR | '#' )* '##' ; stmt: ( . )+ ; - Looks like the DFA needs '#' at the top level of the greedy subrule because the character also match the beginning of the exit branch (and hence require more lookahead to decide). I'd like to know if this is known (and consistent) behavior. Or perhaps I'm way off because I missed something very basic in the grammars above. I did a quick search of the list archive using the MarkMail link Jim provided, and did find a recent thread on non-greedy loop, but it concerns suggestion for v4 and not sure it's directly applicable to this question. Sorry if it seems like I'm beating a dead horse. Being a noob makes me want to dot every i and j twice. Junkman wrote: Hello, The following grammar generates error: - grammar Test; fragment CHAR : . ; STRING: '' ( options {greedy=false;} : CHAR )* '' ; stmt : ( . )+ ; - The error message generated by Check Grammar option of Antlrwork (1.4) is: [15:34:52] error(201): Test.g:6:47: The following alternatives can never be matched: 2 I think it means it cannot exit the non-greedy subrule (of the lexer rule STRING). If I substitute . directly for CHAR, no error. Is this the expected behavior? Is there a problem with the grammar given above? Thanks for any insight/assistance. J Junkman wrote: Hello, Following is a lexer rule to match quoted string that allows backslash escape sequence. STRING :'' ( options {greedy=false;} : ( ~ '\\' | '\\' . ) )* '' ; It seems to work. But if you put the '*' operator inside the subrule like this: STRING :'' ( options {greedy=false;} : ( ~ '\\' | '\\' . )* ) '' ; It eats up everything to EOF. It's as if the greedy option applies to the ((subrule)*) instead of the subrule itself, and only if the subrule is suffixed with '*' operator (or with '+') externally (as in (subrule)*). To my eyes, the second version seems the correct one. Thoughts? J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 29006] [antlr-interest] greedy subrule option idiom
Hello, Following is a lexer rule to match quoted string that allows backslash escape sequence. STRING :'' ( options {greedy=false;} : ( ~ '\\' | '\\' . ) )* '' ; It seems to work. But if you put the '*' operator inside the subrule like this: STRING :'' ( options {greedy=false;} : ( ~ '\\' | '\\' . )* ) '' ; It eats up everything to EOF. It's as if the greedy option applies to the ((subrule)*) instead of the subrule itself, and only if the subrule is suffixed with '*' operator (or with '+') externally (as in (subrule)*). To my eyes, the second version seems the correct one. Thoughts? J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28994] Re: [antlr-interest] Dynamic scope for lexer rule
Thanks for the reply, Jim. I understand the rationale for your suggestion. Might this info be worth adding to the wiki? It may be obvious to seasoned hands, but difference in constraints between lexer and parser rules would be helpful especially since both rule types share the same basic syntax in Antlr. Junkman Jim Idle wrote: Scopes are not supported for lexer rules, you need to implement your own things to do this, but try to leave any kind of context out of the lexer if you can. You want to push such things as high up the tool chain as you can. IT isn't always possible though. Jim -Original Message- From: antlr-interest-boun...@antlr.org [mailto:antlr-interest- boun...@antlr.org] On Behalf Of Junkman Sent: Monday, May 24, 2010 3:22 PM To: antlr-interest@antlr.org Subject: Re: [antlr-interest] Dynamic scope for lexer rule Greetings, Let me raise the question again. Sorry that this is becoming something of pattern for me. Adding a dynamically scoped attribute to a lexer rule seems to generate the error message (shown at the bottom as part of my previous post on this subject) when generating recognizers. The grammar is as follows: - grammar Junkscript; NEWLINE @init { $channel=HIDDEN; } : '\n' ; COMMENT /* scope { String dynamic; } @init { $COMMENT::dynamic = null; } */ : '#' ( options {greedy=false;} : (~ NEWLINE)* ) ; stmt : ( . )+ ; The simple grammar works fine, but with the scope section (along with init action) under COMMENT uncommented, Antlr generates the error. Are dynamically scoped attributes allowed for lexer rules? If so, what is the error in the grammar above? Thanks for any assistance. Junkman Junkman wrote: Greetings, I've added an attribute with dynamic scoping to a lexer rule, and when generating code, I'm encountering an internal error. Listed below is partial call stack reported: error(10): internal error: Junkscript.g : java.lang.NullPointerException org.antlr.grammar.v2.DefineGrammarItemsWalker.ruleScopeSpec(Defin eGrammarItemsWalker.java:1050) at org.antlr.grammar.v2.DefineGrammarItemsWalker.rule(DefineGrammarItemsWa lker.java:891) at org.antlr.grammar.v2.DefineGrammarItemsWalker.rules(DefineGrammarItemsW alker.java:576) at org.antlr.grammar.v2.DefineGrammarItemsWalker.grammarSpec(DefineGrammar ItemsWalker.java:361) at org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItem sWalker.java:193) at org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702) at org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.j ava:351) ... Is dynamic scoping allowed for lexer rule attributes? Thanks for any info. J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 0] Re: [antlr-interest] Dynamic scope for lexer rule
Greetings, Let me raise the question again. Sorry that this is becoming something of pattern for me. Adding a dynamically scoped attribute to a lexer rule seems to generate the error message (shown at the bottom as part of my previous post on this subject) when generating recognizers. The grammar is as follows: - grammar Junkscript; NEWLINE @init { $channel=HIDDEN; } : '\n' ; COMMENT /* scope { String dynamic; } @init { $COMMENT::dynamic = null; } */ : '#' ( options {greedy=false;} : (~ NEWLINE)* ) ; stmt: ( . )+ ; The simple grammar works fine, but with the scope section (along with init action) under COMMENT uncommented, Antlr generates the error. Are dynamically scoped attributes allowed for lexer rules? If so, what is the error in the grammar above? Thanks for any assistance. Junkman Junkman wrote: Greetings, I've added an attribute with dynamic scoping to a lexer rule, and when generating code, I'm encountering an internal error. Listed below is partial call stack reported: error(10): internal error: Junkscript.g : java.lang.NullPointerException org.antlr.grammar.v2.DefineGrammarItemsWalker.ruleScopeSpec(DefineGrammarItemsWalker.java:1050) at org.antlr.grammar.v2.DefineGrammarItemsWalker.rule(DefineGrammarItemsWalker.java:891) at org.antlr.grammar.v2.DefineGrammarItemsWalker.rules(DefineGrammarItemsWalker.java:576) at org.antlr.grammar.v2.DefineGrammarItemsWalker.grammarSpec(DefineGrammarItemsWalker.java:361) at org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItemsWalker.java:193) at org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702) at org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.java:351) ... Is dynamic scoping allowed for lexer rule attributes? Thanks for any info. J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28966] [antlr-interest] Dynamic scope for lexer rule
Greetings, I've added an attribute with dynamic scoping to a lexer rule, and when generating code, I'm encountering an internal error. Listed below is partial call stack reported: error(10): internal error: Junkscript.g : java.lang.NullPointerException org.antlr.grammar.v2.DefineGrammarItemsWalker.ruleScopeSpec(DefineGrammarItemsWalker.java:1050) at org.antlr.grammar.v2.DefineGrammarItemsWalker.rule(DefineGrammarItemsWalker.java:891) at org.antlr.grammar.v2.DefineGrammarItemsWalker.rules(DefineGrammarItemsWalker.java:576) at org.antlr.grammar.v2.DefineGrammarItemsWalker.grammarSpec(DefineGrammarItemsWalker.java:361) at org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItemsWalker.java:193) at org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702) at org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.java:351) ... Is dynamic scoping allowed for lexer rule attributes? Thanks for any info. J List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28939] Re: [antlr-interest] Referencing attributes
Sorry for dupe, but I'm hoping to get some/any response. Are attribute reference allowed outside actions and action-like elements (e.g., semantic predicates), other than as parameters in rule invocation? Thanks for any info. J Junkman wrote: Greetings, I'm a Antlr noob, and have a question regarding accessing attributes. Where, outside of action, can you reference attributes? One place seems to be as parameter to rule invocation like this: decl: type declarator[ $type.text ] ';' ; This is from The Definitive Antlr Reference, page 119. Is that true in general? Are there other locations outside of actions where attributes can be accessed? As noted, I am a noob to Antlr and just joined this list. Please let me know if this email's question/topic is not appropriate to the list. Thanks. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
[il-antlr-interest: 28914] [antlr-interest] Referencing attributes
Greetings, I'm a Antlr noob, and have a question regarding accessing attributes. Where, outside of action, can you reference attributes? One place seems to be as parameter to rule invocation like this: decl: type declarator[ $type.text ] ';' ; This is from The Definitive Antlr Reference, page 119. Is that true in general? Are there other locations outside of actions where attributes can be accessed? As noted, I am a noob to Antlr and just joined this list. Please let me know if this email's question/topic is not appropriate to the list. Thanks. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups il-antlr-interest group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.