[il-antlr-interest: 30260] Re: [antlr-interest] How to force error recovery?

2010-10-04 Thread Junkman
Greetings,

I ran into the same issue, and you probably noticed that, when the
lookahead doesn't match a statement, it breaks out of * loop and tries
to match EOF.

I resorted to calling statement() in a loop to force continuation of
parsing regardless of error, instead of calling compilationUnit().
Seems to work well enough.

It would be good to know if there is a better to handle this, though.

Best,

J


On 10/4/2010 3:27 PM, Edson Tirelli wrote:
Hi all,
 
Look at this simple grammar:
 
 grammar testGrammar;
 options {
   output=AST;
 }
 
 compilationUnit
   : statement* EOF
   ;
   
 statement
   :   A^
   |   B^ C
   ;   
 
 A   :   'a';
 
 B   : 'b';
 
 C   :   'c';  
 
 WS  :   ( ' '
 | '\t'
 | '\r'
 | '\n'
 ) {$channel=HIDDEN;}
 ;
 
 
 Using the above grammar, it will successfully parse an input like:
 
 a b c a
 
 Now, if the input is:
 
 a c a
 
 The generated parser will parse a, and will fail at c, as it
 is not a valid statement. Reading the error recovery chapter on the
 ANTLR book, I would imagine ANTLR would delete/skip the c token and
 try to recover, successfully parsing the second a, as that is a
 valid statement again. But it is not working like this. It is aborting
 the parsing with an error at c.
 
 Question: how do I force it to recover from the error and continue 
 parsing?
 
 The actual scenario is that the parser I am working on is used by
 an IDE environment (eclipse), so we need it to continue parsing and
 presenting the users with all the errors found in the file, not just
 the first one. The error recovery seems to work on some rules, but not
 on the top rule (compilationUnit).
 
 Thanks,
Edson
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30262] Re: [antlr-interest] How to force error recovery?

2010-10-04 Thread Junkman
Thanks for passing on the wiki link.  It's definitely smarter than the
crude approach I took, which requires additional tree assembly as well
as error recovery adjustment to eat up unexpected tokens.

J

On 10/4/2010 4:26 PM, Edson Tirelli wrote:
Thanks for the suggestion. I just found this:
 
 http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery
 
I am trying to check if it works for my case. Otherwise I will try
 your approach.
 
Edson
 
 2010/10/4 Junkman j...@junkwallah.org:
 Greetings,

 I ran into the same issue, and you probably noticed that, when the
 lookahead doesn't match a statement, it breaks out of * loop and tries
 to match EOF.

 I resorted to calling statement() in a loop to force continuation of
 parsing regardless of error, instead of calling compilationUnit().
 Seems to work well enough.

 It would be good to know if there is a better to handle this, though.

 Best,

 J


 On 10/4/2010 3:27 PM, Edson Tirelli wrote:
Hi all,

Look at this simple grammar:

 grammar testGrammar;
 options {
   output=AST;
 }

 compilationUnit
   : statement* EOF
   ;

 statement
   :   A^
   |   B^ C
   ;

 A   :   'a';

 B   : 'b';

 C   :   'c';

 WS  :   ( ' '
 | '\t'
 | '\r'
 | '\n'
 ) {$channel=HIDDEN;}
 ;


 Using the above grammar, it will successfully parse an input like:

 a b c a

 Now, if the input is:

 a c a

 The generated parser will parse a, and will fail at c, as it
 is not a valid statement. Reading the error recovery chapter on the
 ANTLR book, I would imagine ANTLR would delete/skip the c token and
 try to recover, successfully parsing the second a, as that is a
 valid statement again. But it is not working like this. It is aborting
 the parsing with an error at c.

 Question: how do I force it to recover from the error and continue 
 parsing?

 The actual scenario is that the parser I am working on is used by
 an IDE environment (eclipse), so we need it to continue parsing and
 presenting the users with all the errors found in the file, not just
 the first one. The error recovery seems to work on some rules, but not
 on the top rule (compilationUnit).

 Thanks,
Edson



 
 
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30152] Re: [antlr-interest] Literal - ID clash resolution

2010-09-14 Thread Junkman
I'd guess the common first stab would be to have '=' as a distinct token
and elevate ID into a parsing symbol:

EQUAL: '=';
id: (ID | EQUAL) ;
option: optionName=id EQUAL STRING ;

It might become bit more interesting if you also need to make optionName
optional.

But maybe you could rewind the stack a bit, and reconsider if you really
want/need '=' as a valid identifier?

J

On 9/14/2010 10:15 AM, Bill Andersen wrote:
 Folks
 
 I'm having a small problem.   Not that I can't solve it myself but it's one 
 of those things for which:
 
 a) I'm sure there exists a good stock solution, and
 
 b) Google is especially poorly suited to find in a search
 
 Here it is.  I have rules in the grammar for my DSL that have '=' as a 
 literal appearing them.  Like this
 
 option
   : optionName=ID '=' STRING
   ;
 
 The DSL parses a language specification and that specification can define 
 reserved words, one of which (in my test case) is '='.  This creates a 
 problem: The DSL grammar must recognize '=' as an instance of identifier (ID 
 - I'm using ANTLRWorks default lexer rule template, slightly modified, for 
 now) but it can't recognize '=' as such because it's already a literal used 
 in the DSL grammar.
 
 Can anyone tell me what the best way to deal with this is?
 
 If my explanation doesn't make sense (seems mine often don't for some reason) 
 I'll be glad to post the whole grammar, but I don't think that's necessary.
 
   .bill
 
 
 
 
 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe: 
 http://www.antlr.org/mailman/options/antlr-interest/your-email-address
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29778] Re: [antlr-interest] Doubt About rewrite rulse

2010-08-10 Thread Junkman
Hi Victor,

Victor Giordano wrote:
 Hi, i am a newbie. Trying to figure out how to work with AST tree and 

 ...

 bu if i want to use rewrite rules... how do i thread the 
 repetion EBNF operator like * or +.
 
 expr : term (('+'|'-') term)* - term ^(('+'|'-') term)* ;
 

try this:

expr: ( term - term )
  ( ( '+' | '-' ) term - ^( ( '+' | '-' ) $expr term ) )* ;


Not sure if the terms need to be distinguished with labels.

The Antlr reference book describes the use of rewrite rule inside
subrule in more detail.

J

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29726] Re: [antlr-interest] Building a tree grammar expression to recognize arithmetic expressions

2010-08-07 Thread Junkman
Hi Alex,

Alex Storkey wrote:
 Hi, it's my first time posting in a mailing list like this so go easy on me
 if I'm breaking some etiquette or anything :)
 
 I'm trying to construct an expression in my tree grammar to recognize an AST
 of simple mathematical expressions like 1+(-(a-b)) in tree format of (+ 1 (-
 (- a b))) that is generated by my parser grammar.
 
 I've tried a couple of different approaches and I can't figure out where I'm
 going wrong. Could someone explain what's wrong with the following two
 expressions:
 expression
 :(MINUS^)? term;

If I understand you correctly, you are asking about writing tree parser
grammar.

Does Antlr even compile the grammar (i.e., generate a tree parser) with
the above rule?  I think the rule must be of the form of rewrite rules.

J



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29713] Re: [antlr-interest] Need some help with AST creation

2010-08-06 Thread Junkman
Hi Luis,

You can try this:

tokens {
// Semantic tokens
FIELD;
INDEX;
}

...

fieldExpr: (atom - atom)
( '.' identifier - ^(FIELD $fieldExpr identifier)
| '[' expr ']' - ^(INDEX $fieldExpr expr)
)*
;

If you need the semantic tokens to have the input stream context data,
there is a way to create them out of another token, copying its context
data, for example in this case say FIELD to copy the context of '.' and
INDEX to '['.  The notation for this escapes me for the moment, but I
think the info won't be difficult to find in the wiki/documentation on
Antlr's website.

Hope that helps,

Jay

Luis Pureza wrote:
 Hi,
 
 I need some help from the ANTLR wizards :)
 
 I'm trying to match expressions with field accesses and array indexes.
 For example:
 
 costumers.length
 costumers[0].address
 costumers[costumers.length - 1].orders[0].total
 
 
 The following rule seems to work:
 
 fieldExpr  : atom ('.'^ identifier | ('['^ expr ']'!))*;
 
 However, it creates trees with notes annotated with '[', and I'd
 prefer to have a dummy token like INDEX. For example, costumers[0] now
 returns
 
 ([ (ID costumers) (INT 0))
 
 But I'd like it to return
 
 (INDEX (ID costumers) (INT 0))
 
 I tried to create the AST manually with - ^(...), but I ended up
 nowhere. Maybe I should've tried to refactor the grammar, but that
 would make it a little less readable, so I didn't do it.
 How do you suggest I do this?
 
 Thank you!
 
 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe: 
 http://www.antlr.org/mailman/options/antlr-interest/your-email-address
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29718] Re: [antlr-interest] Need some help with AST creation

2010-08-06 Thread Junkman
Jim Idle wrote:
 You can just do this:
 
 
 ddd: a=TOKEN^ B C D { $a.type = INDEX; } ;
 
 Jim
 

Typical C programmer mentality.  :-)

Best,

J

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29694] Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node

2010-08-05 Thread Junkman
Gerald Rosenberg wrote:
  -- Original Message (Wednesday, August 04, 2010 5:21:33 PM) From:
 Junkman --
 Subject: Re: [antlr-interest] Tree parser eats up DOWN node when navigating
 optional child node

 You wrote AST ^( ^( PARENT A ) B ).  Can you describe the tree this
 notates?  I can see it as a node sequence, but I don't know what tree
 structure it is describing.

 Thanks for the reply.

 Jay

 
 The AST
 
 ^( ^( PARENT A? ) B? )
 
 should implement as
 
 ( ( PARENT Token.DOWN A? Token.UP ) Token.DOWN B? Token.UP )
 
 but is actually
 
 ( PARENT Token.DOWN A? B? Token.UP )
 
 Because parent_a is the root of parent, the parser is (for some reason)
 not actually generating the middle Token.UP Token.DOWN sequence.

It's because the parser generates trees, not node streams.

UP and DOWN nodes are marker nodes injected while flattening a tree, and
the resulting node stream naturally will contain neither empty DOWN-UP
sequence (edges to non-existing node) nor empty UP-DOWN sequence between
sibling nodes (duplicate edges).

So the parser's tree generation behavior makes sense.  What's new to me
is that tree parser interprets the rewrite expression differently (e.g.,
expecting the empty marker node sequences), and I think that is contrary
to TDAR's suggestion that tree parser rules, in a large part, can be
constructed simply by preserving the rewrite expressions from the parser
rules.

BTW, I found an open bug issue that may be related:

http://www.antlr.org/jira/browse/ANTLR-391

It's reported by and assigned to Terrance, so perhaps he can comment on
this?

 explains why P and PA work, but PB and PAB do not - after matching for
 A?, the tree parser is looking for UP, but finding B.  Not sure why
 Antlr is doing this - not expected.
 
 Changing A and/or B to non-optional does not change this behavior.
 
 If, however, you change the parent rule to
 
 parent : parent_a B? - ^( M parent_a B? )  ;
 
 where M is an imaginary token (and make the corresponding change to the
 tree grammar), all four patterns will parse and match as expected:
 
 AST:
 
 ^( M ^(PARENT A? ) B? )
 
 properly implements as
 
 ( ( M Token.DOWN PARENT A? Token.UP ) Token.DOWN B? Token.UP )
 
 

Yes, but this is addressing a different issue - I want the tree parser
to recognize my AST, rather than changing the AST to fit the tree parser.

For now, though, I think I get the gist of how tree parser interprets
the rewrite expression (differently than parser), so I will have to
update my tree parser grammar accordingly, although it's odd that my
current (i.e. old) tree parser generates relatively few error messages...

Thanks for your help,

Jay

PS: This list doesn't seem the chattiest of mailing list, but please
chime in if I have it wrong above, or if you have other insight on the
subject.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29668] Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node

2010-08-04 Thread Junkman
Thanks for the replies, Jim  Gerald.

Your responses and some more testing suggests the following to me:

1. I cannot nest a tree parser rule (inner rule) in another rule
(outer rule), and try to have the outer rule match additional nodes in
the subtree matched by the inner rule.

2. Consequently, the set of trees generated by rewrite expression does
not necessarily match the set of trees matched by the same rewrite
expression in the tree parser.

Am I in the ballpark here?

Jay

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29671] Re: [antlr-interest] Tree parser eats up DOWN node when navigating optional child node

2010-08-04 Thread Junkman
Gerald Rosenberg wrote:
  As best I understand your questions, the answers are no, no, and no . . .
 
 Given an input PAB, your given parser will construct an AST  ^( ^(
 PARENT A ) B ) and your given tree grammar will likewise match that.
 

Actually, I don't think I understood you.

You wrote AST ^( ^( PARENT A ) B ).  Can you describe the tree this
notates?  I can see it as a node sequence, but I don't know what tree
structure it is describing.

Thanks for the reply.

Jay

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29657] [antlr-interest] Tree parser eats up DOWN node when navigating optional child node

2010-08-03 Thread Junkman
Greetings,

I am seeing an interesting behavior in generated tree parsers.

This is an example grammar fragment:


tree grammar TTreeParser;

...

parent: ^(parent_a B?) ;
parent_a: ^(PARENT A?) ;


The intent is for parent_a to match a PARENT node optionally with the
child node A, while parent is to match a PARENT node that can also have
child node B as well as child node A.

But parent rule throws up recognition exception when fed this tree:

^(PARENT B)

The problem is parent_a consumes the DOWN node before B instead of
skipping it.

The following tree also causes the exception for parent:

^(PARENT A B)

In this case, parent_a, after consuming A, expects UP when there is
still another sibling node - B.

It looks like a discrepancy in the rewrite rule interpretation - when
used to produce tree, the rules work as expected/intended.

I am looking for insight/suggestion to get the tree parser work as
intended.  Attached are example grammars and generated code plus test
driver to demonstrate the issue I'm having.

Thanks for any help,

Jay


tree grammar TTreeParser;

options {
tokenVocab=T;
ASTLabelType=CommonTree;
}


parent: ^(parent_a B?) ;
parent_a: ^(PARENT A?) ;
grammar T;

options {
output=AST;
}

PARENT: 'P' ;
A: 'A' ;
B: 'B' ;


parent: parent_a B? - ^(parent_a B?) ;
parent_a: PARENT A? - ^(PARENT A?) ;

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29211] [antlr-interest] Antlrworks with Python-like grammar

2010-06-18 Thread Junkman
Hello,

I have a parser grammar that relies on a custom TokenStream - quite like
the Python grammar posted on the Antlr website that relies on
PythonTokenStream.java.

I am wondering if there is a way to run/debug the parser in AntlrWorks -
it would be nice if I can make use of AntlrWorks' debugger visualization
features.

Thanks for any help,

Jay

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29159] Re: [antlr-interest] Multiple lexer tokens per rule

2010-06-08 Thread Junkman
In case anyone reads this thread again, Antlr wiki has a better example
for emitting multiple tokens:

http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497

Cheers.

Junkman wrote:
 Ken Williams wrote:

 On 6/4/10 4:16 PM, Junkman j...@junkwallah.org wrote:
 The way nextToken() is overriden, it first returns the token matched by
 the rule, and subsequently any additional queued token before matching a
 new token in the input stream.
 Maybe I'm being dense here, but I don't think that's what it's doing:

 public Token nextToken() {
 return tokenQueue.isEmpty() ? super.nextToken() : tokenQueue.poll();
 }

 If tokenQueue() is non-empty, it always uses it.  On the *next* invocation,
 when it's empty, it will call super.nextToken().


 
 Think of tokens generated by a single rule invocation as a set.  The set
 is generated in/under super.nextToken(), AFTER the queue has been
 tested to be empty.  Among the tokens in the set, the matching token
 is returned first, because that's what Lexer.nextToken()
 (super.nextToken()) returns.
 
 If that's still not clear, I suggest you put the generated lexer under a
 debugger (like Jim suggested in another thread ;-) and trace it from
 nextToken() - will give you better explanation than my verbiage.
 
 Best regards.
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29131] Re: [antlr-interest] Parsing whole-line comments?

2010-06-06 Thread Junkman
It's probably better to keep lexer simple - just convert character
stream into a token stream - and push contextual constraints like
beginning of the line into parsing rules, like this:


/* Tokens */
NEWLINE: '\n' ;
E:  'E';
C:  'C';
CALL: 'CALL';
// default greediness ensures CALL is matched as CALL instead of C.


/* Parsing rules */
stmt : E ... NEWLINE
 | C ... NEWLINE
 | CALL ... NEWLINE
 ;


Use stmt as the start symbol for the parser, and you have imposed
contextual rules for tokens via defining what are valid stmt's.

Christian Convey wrote:
 That is, beginning of line the letter C zero or more
 non-end-of-line characters end-of-line

 My problem is, to my knowledge ANTLR won't let me define tokens that
 match on the beginning of a line ('^').

 Any suggestions?

 There is no need to match such positions: when you match a certain line (a
 token that ends with a line break), the next character will be the first in
 a (new) line.
 Something like this should do the trick:

 grammar Test;
 parse
   : (Comment | Line)+ EOF
   ;
 Comment
   :  'C' ~('\r' | '\n')* (NewLine | EOF)
   ;
 Line
   :  ~'C' ~('\r' | '\n')* (NewLine | EOF)
   ;
 fragment
 NewLine
   :  '\r'? '\n'
   |  '\r'
   ;
 
 Thanks, that may work for my particular language, because I may have
 no other tokens that begin with a capital letter 'C'.
 
 But let me wax hypothetical for a minute.  Suppose that in other,
 non-comment lines, I have need to support another token that begins
 with a capital C.  For example, 'CALL'.   So my DSL might have a
 program like this:
 
 C My test
 E CALL FOO
 CALL This is a comment because 'C' is in first column.
 
 Any suggestions for how to an ANTLR lexeme/grammar should handle this?
  My impression is that something like Flex, whose token regex's can
 match the beginning-of-line imaginary character, would just let me do
 this:
 
 CommentToken ::= ^C.*$
 CallToken ::= ~(^)CALL
 
 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe: 
 http://www.antlr.org/mailman/options/antlr-interest/your-email-address
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29133] Re: [antlr-interest] Parsing whole-line comments?

2010-06-06 Thread Junkman
Christian Convey wrote:
 
 /* Tokens */
 NEWLINE: '\n' ;
 E:  'E';
 C:  'C';
 CALL: 'CALL';
 // default greediness ensures CALL is matched as CALL instead of C.
 
 Thanks, but 'C' can also be the name of a variable, as long as it's
 not in the first column.  So I don't think greediness solves the whole
 problem.
 

I wonder if this would work better in that case:
---
/* Tokens */
NEWLINE: '\n' ;

/* Parsing rules */
stmt : 'E' ... NEWLINE
 | 'C' ... NEWLINE
 | 'CALL'  ... NEWLINE
 ;
---

Nor sure since I don't know how explicitly defined tokens are treated
differently from tokens implicitly defined in parsing rules.

Alternatively, you can apply semantic predicate to lexer rules like this:


C:  { $pos == 0 }?= 'C' ;



It should only match C at the beginning of the line, but I found (in
my noob experiences) semantic predicate can be pretty tricky due to
hoisting out business and how it affects prediction DFA construction -
I'm sure more experienced hands can tell you better.

Good luck.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29115] Re: [antlr-interest] Multiple lexer tokens per rule

2010-06-03 Thread Junkman
Try this to get you started:
-
@lexer::members {

// Queue to hold additional tokens
private java.util.QueueToken tokenQueue = new
java.util.LinkedListToken();

// Include queue in reset().
public void reset() {
super.reset();
tokenQueue.clear();
}

// Queued tokens are returned before matching a new token.
public Token nextToken() {
if (tokenQueue.peek() != null)
return tokenQueue.poll();
return super.nextToken();
}

}

MATCHED_TOKEN:  ...
{
// Add additional tokens to the queue.
tokenQueue( new CommonToken( ... ) );
}

-

MATCHED_TOKEN is returned first, and additional tokens queued by
MATCHED_TOKEN's action are returned subsequently before matching new
tokens in the input stream.

Instantiate the additional token accordingly if you need input stream
context - see Lexer.emit().



Ken Williams wrote:
 
 On 6/3/10 4:18 PM, Jim Idle j...@temporal-wave.com wrote:
 
 Add to an array or collection then get nextToken to remove from the
 collection. It si slower to do this so it isn't the default way.
 
 Yeah, that's what the book says. =)
 
 It seems like there are some subtleties involved, though - there's a lot of
 bookkeeping in nextToken() that looks kind of scary (e.g. the
 current-line-number stuff, the default-channel stuff, etc.), and if I
 override it I'm really not confident I'll do it correctly.  I'm also unsure
 how mTokens(), emit(), and nextToken() cooperate with their member
 variables.
 
 I tried this simple-minded implementation, and started getting out-of-bounds
 exceptions:
 
 @lexer::members {
 ListToken tokBuf = new ArrayListToken();
 public Token nextToken() {
 while (tokBuf.isEmpty()) {
 emit();
 }
 return tokBuf.remove(0);
 }
 public void emit(Token token) {
 tokBuf.add(token);
 }
 }
 
 
 So if someone does have a working example, I'd love to see it!
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29072] Re: [antlr-interest] Comments, EOF, and Debugger

2010-06-01 Thread Junkman
Disclaimer:  I'm a noob.  :)

Taking the newline out of comment seems to work, like this:

COMMENT : '#' (~( '\r' | '\n' ))* ;
NEWLINE : '\r'? '\n'
  {
  // kick it off to the hidden channel
  // $channel=HIDDEN;

  // or skip it altogether
  // skip();

  }
  ;

Last line comment terminating in EOF presents no problem.

I've seen this pattern for comment in other examples.

Don't know how/why debuggerLexer changes the outcome, but I assume you
can always trace the generated lexers to see how the different outcomes
result.

J

Nathan Eloe wrote:
 
 On Jun 1, 2010, at 1:33 PM, ante...@freemail.hu wrote:
 
 6/1/2010 3:33 PM keltezéssel, Nathan Eloe írta:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hello all,
 I'm working on an AST parser for the Bash language and I've come across the 
 following strange behavior:
 I'm trying to handle comments, so I used the comments token you can get 
 when you start a new grammar in ANTLRworks.  It works.

 COMMENT
 :   '#' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
 ;

 The problem arises when the comment is the last thing from the input (i.e, 
 no new line before EOF).  Removing the '\n' from the token causes it to 
 freak out when I run the tests, but I can't get it to match comments at the 
 end of file.  Leaving that '\n' in lets the code compile, but I still can't 
 match that last case.

 Here's where the interesting part happens.  When I run it through the 
 debugger with the same test case that I use in gunit, the debugger allows 
 the input and parses it correctly (meaning, it ignores it as it should) and 
 correctly generates the expected AST.

 Does the debugger allow the code to be more robust in its decision making 
 abilities?  Or does it do something to the input to allow it to be matched 
 to a token.

 Thanks for the help!

 Nathan
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29030] Re: [antlr-interest] greedy subrule option idiom

2010-05-28 Thread Junkman
Here is another variation of the grammar:

--

grammar Test;

fragment
CHAR:   '\u'..'#' | '$'..'\u' ;

STRING  :   '##' ( options {greedy=false;} : CHAR )* '##' ;

stmt:   
( . )+
;




This generates grammar check error just like the one in my previous post
(attached at the bottom).

The error goes away if I pull the character '#' out of CHAR and inline
it into STRING with '|' operator next to CHAR like this:



grammar Test;

fragment
CHAR:   '\u'..'' | '$'..'\u' ;

STRING  :   '##' ( options {greedy=false;} : CHAR | '#' )* '##' ;

stmt:   
( . )+
;

-

Looks like the DFA needs '#' at the top level of the greedy subrule
because the character also match the beginning of the exit branch (and
hence require more lookahead to decide).

I'd like to know if this is known (and consistent) behavior.  Or perhaps
I'm way off because I missed something very basic in the grammars above.

I did a quick search of the list archive using the MarkMail link Jim
provided, and did find a recent thread on non-greedy loop, but it
concerns suggestion for v4 and not sure it's directly applicable to this
question.

Sorry if it seems like I'm beating a dead horse.  Being a noob makes me
want to dot every i and j twice.

Junkman wrote:
 Hello,
 
 The following grammar generates error:
 
 -
 grammar Test;
 
 fragment
 CHAR  :   . ;
 
 STRING:   '' ( options {greedy=false;} : CHAR )* '' ;
 
 stmt  :   
   ( . )+
   ;
 
 -
 
 The error message generated by Check Grammar option of Antlrwork (1.4) is:
 
 [15:34:52] error(201): Test.g:6:47: The following alternatives can never
 be matched: 2
 
 I think it means it cannot exit the non-greedy subrule (of the lexer
 rule STRING).
 
 If I substitute . directly for CHAR, no error.
 
 Is this the expected behavior?  Is there a problem with the grammar
 given above?
 
 Thanks for any insight/assistance.
 
 J
 
 Junkman wrote:
 Hello,

 Following is a lexer rule to match quoted string that allows backslash
 escape sequence.


 STRING
  :'' ( options {greedy=false;} : ( ~ '\\' | '\\' . ) )* ''
  ;


 It seems to work.  But if you put the '*' operator inside the subrule
 like this:


 STRING
  :'' ( options {greedy=false;} : ( ~ '\\' | '\\' . )* ) ''
  ;


 It eats up everything to EOF.

 It's as if the greedy option applies to the ((subrule)*) instead of the
 subrule itself, and only if the subrule is suffixed with '*' operator
 (or with '+') externally (as in (subrule)*).

 To my eyes, the second version seems the correct one.

 Thoughts?

 J

 
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29006] [antlr-interest] greedy subrule option idiom

2010-05-26 Thread Junkman
Hello,

Following is a lexer rule to match quoted string that allows backslash
escape sequence.


STRING
:'' ( options {greedy=false;} : ( ~ '\\' | '\\' . ) )* ''
;


It seems to work.  But if you put the '*' operator inside the subrule
like this:


STRING
:'' ( options {greedy=false;} : ( ~ '\\' | '\\' . )* ) ''
;


It eats up everything to EOF.

It's as if the greedy option applies to the ((subrule)*) instead of the
subrule itself, and only if the subrule is suffixed with '*' operator
(or with '+') externally (as in (subrule)*).

To my eyes, the second version seems the correct one.

Thoughts?

J











List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 28994] Re: [antlr-interest] Dynamic scope for lexer rule

2010-05-25 Thread Junkman
Thanks for the reply, Jim.  I understand the rationale for your suggestion.

Might this info be worth adding to the wiki?  It may be obvious to
seasoned hands, but difference in constraints between lexer and parser
rules would be helpful especially since both rule types share the same
basic syntax in Antlr.

Junkman

Jim Idle wrote:
 Scopes are not supported for lexer rules, you need to implement your own 
 things to do this, but try to leave any kind of context out of the lexer if 
 you can. You want to push such things as high up the tool chain as you can. 
 IT isn't always possible though.
 
 Jim
 
 -Original Message-
 From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
 boun...@antlr.org] On Behalf Of Junkman
 Sent: Monday, May 24, 2010 3:22 PM
 To: antlr-interest@antlr.org
 Subject: Re: [antlr-interest] Dynamic scope for lexer rule

 Greetings,

 Let me raise the question again.  Sorry that this is becoming something
 of pattern for me.

 Adding a dynamically scoped attribute to a lexer rule seems to generate
 the error message (shown at the bottom as part of my previous post on
 this subject) when generating recognizers.

 The grammar is as follows:
 -

 grammar Junkscript;



 NEWLINE
 @init { $channel=HIDDEN; }
  :   '\n'
  ;


 COMMENT
 /*
 scope {
  String dynamic;
  }
 @init {
  $COMMENT::dynamic = null;
  }
  */
  :   '#' ( options {greedy=false;} : (~ NEWLINE)* ) ;


 stmt :
  ( . )+
  ;



 

 The simple grammar works fine, but with the scope section (along with
 init action) under COMMENT uncommented, Antlr generates the error.

 Are dynamically scoped attributes allowed for lexer rules?   If so,
 what
 is the error in the grammar above?

 Thanks for any assistance.

 Junkman

 Junkman wrote:
 Greetings,

 I've added an attribute with dynamic scoping to a lexer rule, and
 when
 generating code, I'm encountering an internal error.  Listed below
 is
 partial call stack reported:

 error(10): internal error: Junkscript.g :
 java.lang.NullPointerException
  org.antlr.grammar.v2.DefineGrammarItemsWalker.ruleScopeSpec(Defin
 eGrammarItemsWalker.java:1050)
 at

 org.antlr.grammar.v2.DefineGrammarItemsWalker.rule(DefineGrammarItemsWa
 lker.java:891)
 at

 org.antlr.grammar.v2.DefineGrammarItemsWalker.rules(DefineGrammarItemsW
 alker.java:576)
 at

 org.antlr.grammar.v2.DefineGrammarItemsWalker.grammarSpec(DefineGrammar
 ItemsWalker.java:361)
 at

 org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItem
 sWalker.java:193)
 at org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702)
 at

 org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.j
 ava:351)
 ...


 Is dynamic scoping allowed for lexer rule attributes?

 Thanks for any info.

 J






 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
 email-address
 
 
 
 
 List: http://www.antlr.org/mailman/listinfo/antlr-interest
 Unsubscribe: 
 http://www.antlr.org/mailman/options/antlr-interest/your-email-address
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 0] Re: [antlr-interest] Dynamic scope for lexer rule

2010-05-24 Thread Junkman
Greetings,

Let me raise the question again.  Sorry that this is becoming something
of pattern for me.

Adding a dynamically scoped attribute to a lexer rule seems to generate
the error message (shown at the bottom as part of my previous post on
this subject) when generating recognizers.

The grammar is as follows:
-

grammar Junkscript;



NEWLINE 
@init { $channel=HIDDEN; }
:   '\n'
;


COMMENT
/*
scope {
String dynamic;
}
@init {
$COMMENT::dynamic = null;
}
*/
:   '#' ( options {greedy=false;} : (~ NEWLINE)* ) ;


stmt:   
( . )+
;





The simple grammar works fine, but with the scope section (along with
init action) under COMMENT uncommented, Antlr generates the error.

Are dynamically scoped attributes allowed for lexer rules?   If so, what
is the error in the grammar above?

Thanks for any assistance.

Junkman

Junkman wrote:
 Greetings,
 
 I've added an attribute with dynamic scoping to a lexer rule, and when
 generating code, I'm encountering an internal error.  Listed below is
 partial call stack reported:
 
 error(10): internal error: Junkscript.g : java.lang.NullPointerException
   
 org.antlr.grammar.v2.DefineGrammarItemsWalker.ruleScopeSpec(DefineGrammarItemsWalker.java:1050)
   at
 org.antlr.grammar.v2.DefineGrammarItemsWalker.rule(DefineGrammarItemsWalker.java:891)
   at
 org.antlr.grammar.v2.DefineGrammarItemsWalker.rules(DefineGrammarItemsWalker.java:576)
   at
 org.antlr.grammar.v2.DefineGrammarItemsWalker.grammarSpec(DefineGrammarItemsWalker.java:361)
   at
 org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItemsWalker.java:193)
   at org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702)
   at
 org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.java:351)
 ...
 
 
 Is dynamic scoping allowed for lexer rule attributes?
 
 Thanks for any info.
 
 J
 
 
 
 
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 28966] [antlr-interest] Dynamic scope for lexer rule

2010-05-22 Thread Junkman
Greetings,

I've added an attribute with dynamic scoping to a lexer rule, and when
generating code, I'm encountering an internal error.  Listed below is
partial call stack reported:

error(10): internal error: Junkscript.g : java.lang.NullPointerException

org.antlr.grammar.v2.DefineGrammarItemsWalker.ruleScopeSpec(DefineGrammarItemsWalker.java:1050)
at
org.antlr.grammar.v2.DefineGrammarItemsWalker.rule(DefineGrammarItemsWalker.java:891)
at
org.antlr.grammar.v2.DefineGrammarItemsWalker.rules(DefineGrammarItemsWalker.java:576)
at
org.antlr.grammar.v2.DefineGrammarItemsWalker.grammarSpec(DefineGrammarItemsWalker.java:361)
at
org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItemsWalker.java:193)
at org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702)
at
org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.java:351)
...


Is dynamic scoping allowed for lexer rule attributes?

Thanks for any info.

J





List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 28939] Re: [antlr-interest] Referencing attributes

2010-05-20 Thread Junkman
Sorry for dupe, but I'm hoping to get some/any response.

Are attribute reference allowed outside actions and action-like elements
 (e.g., semantic predicates), other than as parameters in rule invocation?

Thanks for any info.

J

Junkman wrote:
 Greetings,
 
 I'm a Antlr noob, and have a question regarding accessing attributes.
 
 Where, outside of action, can you reference attributes?  One place seems
 to be as parameter to rule invocation like this:
 
 decl: type declarator[ $type.text ] ';' ;
  
 This is from The Definitive Antlr Reference,  page 119.
 
 Is that true in general?  Are there other locations outside of actions
 where attributes can be accessed?
 
 As noted, I am a noob to Antlr and just joined this list.  Please let me
 know if this email's question/topic is not appropriate to the list.
 
 Thanks.
 
 
 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 28914] [antlr-interest] Referencing attributes

2010-05-19 Thread Junkman
Greetings,

I'm a Antlr noob, and have a question regarding accessing attributes.

Where, outside of action, can you reference attributes?  One place seems
to be as parameter to rule invocation like this:

decl: type declarator[ $type.text ] ';' ;
 
This is from The Definitive Antlr Reference,  page 119.

Is that true in general?  Are there other locations outside of actions
where attributes can be accessed?

As noted, I am a noob to Antlr and just joined this list.  Please let me
know if this email's question/topic is not appropriate to the list.

Thanks.



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
il-antlr-interest group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.