[il-antlr-interest: 34916] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Bart Kiers
Hi John,


On Tue, Nov 15, 2011 at 11:46 PM, John B. Brodie  wrote:

> Greetings!
> ...
> I do not think you want to recognize floating point values in the
> parser. any tokens you send to the HIDDEN $channel (or skip();) will be
> silently accepted before and after the '.' of the float. change your
> INTEGER rule to this:
>

I fully agree...



> fragment FLOAT: ;
> INTEGER : DIGIT+ ('.' DIGIT+ {$type=FLOAT;} )? ;
>
> and use FLOAT in the number rule.
>

.. however, Jarrod's grammar allows for input to end with `expression '.'`,
which could be "123." (an INTEGER followed by a DOT). This would be input
the lexer would trip over.

A possible fix could look like:

INTEGER
  :  DIGIT+ ({input.LA(1)=='.' && input.LA(2)>='0' && input.LA(2)>='9'}?=>
'.' DIGIT+ {$type=FLOAT;})?
  ;

I.e., only match a '.' if the character after the '.' is a digit.

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34917] [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Ruslan Zasukhin
Hi Jim,

I have spent 2 days running around this, and now I am ready describe what I
see, to get your help, and it seems exists bug/leaks in reuse() area. Or I
not correctly use it, but I do as you described in single letter 3 months
ago.

So ... Long story :-)

* I have simple bench that do 100K INSERT commands.

v2 parser do this in 19 seconds.
v3 parser no reuse do this in 24 seconds.

OF COURSE we must expect speedup if to reuse lexer/parser.

So I have design code to be able easy switch between these 2 ways.
And when I try go with reuse I get comparable speed by 2GB of RAM eaten.

=
* Using Apple XCODE 4.2 Instruments, I see what is going on.

   this is not leaks actually, just parser always allocate and allocate
ANTLR_STRING objects, in parser and tree-parser rules which use

$c.text


=
FOR EXAMPLE:

* I did have in the parser rule:

hex_string_literal
:s = HEX_NUMBER -> CONST_STR_HEX[$s.text->chars]
;

ZERO my own code here. Right?
And I see that $s.textin C code expanded to getText() allocates and
allocates ... 
So it is never reused as I understand.



=
BTW

When I have to see that get_Text() is used, and I remember you told avoid
this, 
I have jump to sources and have come to  idea:

why here to create new token, I need getText() ??
May be I can just change token type as the following:

hex_string_literal
:s = HEX_NUMBER  { $s->setType( $s, CONST_STR_HEX ); }
;

And it seems this works fine

I have correct few rules in such way in the parser
But Tree Parser  still have for example this:

general_literal returns [ENode_Const_Ptr res]
: cd=CONST_DATE {res=make_enode_date ( GET_FBL_STRING( $cd.text) );}
| ct=CONST_TIME {res=make_enode_time ( GET_FBL_STRING( $ct.text) );}
| s=const_str   {res=make_enode_str  ( GET_FBL_STRING( $s.text ) );}
;

All these  $c.text  calls getText() -- this makes COPY of string buffer,
Then I convert into our own FBL_String...

PROBLEM 1:  this ANTLR STRINGs produced by get_Text()  never are reused as I
see.

PROBLEM 2:  related to speed also ‹ how we can avoid here make copy of
string?
 in sources I see that exists code as

return ((pANTLR3_COMMON_TREE)(tree->super))->token->getText(
   ((pANTLR3_COMMON_TREE)(tree->super))->token);


May be something can be optimized/hacked here?
For example may be I can write own func, which check what token have
  char* or ANTLR_String, and choose way ...

But what syntax come to token in the .g?
I can do own macro of course ...
Just I want get some feedback if this can be a good idea for all?


=
And this is how I try reuse Lexer/Parser and NOT TreeParser.
All follow to your letter Jim:

void SqlParser_v3::ResuseParserObjects(
const char*inTextToParse,
vuint32inLength )
{
// ---
// TREE PARSER cannot be reused. Destroy it.
//
if( mpTreeParser )
{
mpTreeParser->free( mpTreeParser );
mpTreeParser = NULL;
}

if( mpNodes )
{
mpNodes->free( mpNodes );
mpNodes = NULL;
}


// ---
// Reuse other objects
//
mpInput->reuse(
mpInput, 
(pANTLR3_UINT8) inTextToParse,
(ANTLR3_UINT32) inLength,
(pANTLR3_UINT8) "VSQL" );

mpTokenStream->reset( mpTokenStream );
mpLexer ->reset( mpLexer );
mpParser ->reset( mpParser );

ResetOwnData( mpParser );
}





-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34918] [antlr-interest] question on java.g 1.6

2011-11-16 Thread Jeremy Long
I am still fairly new to antlr and when looking at the 1.6 Java grammar I
noticed the following statement rule:

statement
:   block

|   ('assert'
)
expression (':' expression)? ';'
|   'assert'  expression (':' expression)? ';'
|   'if' parExpression statement ('else' statement)?
|   forstatement
|   'while' parExpression statement
|   'do' statement 'while' parExpression ';'
|   trystatement
|   'switch' parExpression '{' switchBlockStatementGroups '}'
|   'synchronized' parExpression block
|   'return' (expression )? ';'
|   'throw' expression ';'
|   'break'
(IDENTIFIER
)? ';'
|   'continue'
(IDENTIFIER
)? ';'
|   expression  ';'
|   IDENTIFIER ':' statement
|   ';'
;


My question is about the two lines for assert:
|   ('assert'
)
expression (':' expression)? ';'
|   'assert'  expression (':' expression)? ';'
To me those look identicle - am I missing something? Is there some nuance
to the parens that I don't understand?

Thanks,

jeremy

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34920] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Jim Idle
[C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks,
oops.

Do not use the $text annotations if you want performance, they are purely
for convenience – I must have said this 5000 times and I wish I had never
added that bit ;) I also told you 3 or 4 times in various emails not to use
it. I think that that is in the API docs somewhere, but I should make sure
that it is, if it is not.


There is no memory leak, but the auto string stuff does not release until
you free the string factory, which only happens when you free the parser,
not when you reuse it. Because it allocates small strings all the time, it
kills performance, and then you will page.



xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;



I think that the field name is type but you get the idea. Don’t use the
fake object oriented stuff when you want performance, use the structs
directly – you will find that it is many times faster than the v2 C++, not
slower – this is C and you should get as close to the metal as you can.



I think that I will make some time for performance in v3, but moving the
token interface out of each individual token and so on – originally I could
not predict what people wanted to do, but I don’t see anyone overriding
anything to do with tokens on an individual basis for instance and that
eats memory and so on.



Jim



*From:* Ruslan Zasukhin [mailto:ruslan_zasuk...@valentina-db.com]
*Sent:* Wednesday, November 16, 2011 2:46 AM
*To:* antlr-interest@antlr.org; Jim Idle
*Subject:* [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB
leaks, oops.



Hi Jim,

I have spent 2 days running around this, and now I am ready describe what I
see, to get your help, and it seems exists bug/leaks in reuse() area. Or I
not correctly use it, but I do as you described in single letter 3 months
ago.

So ... Long story :-)

* I have simple bench that do 100K INSERT commands.

v2 parser do this in 19 seconds.
v3 parser no reuse do this in 24 seconds.

OF COURSE we must expect speedup if to reuse lexer/parser.

So I have design code to be able easy switch between these 2 ways.
And when I try go with reuse I get comparable speed by 2GB of RAM eaten.

=
* Using Apple XCODE 4.2 Instruments, I see what is going on.

   this is not leaks actually, just parser always allocate and allocate
ANTLR_STRING objects, in parser and tree-parser rules which use

$c.text


=
FOR EXAMPLE:

* I did have in the parser rule:

hex_string_literal
:s = HEX_NUMBER -> CONST_STR_HEX[$s.text->chars]
;

ZERO my own code here. Right?
And I see that $s.textin C code expanded to getText() allocates and
allocates ...
So it is never reused as I understand.



=
BTW

When I have to see that get_Text() is used, and I remember you told avoid
this,
I have jump to sources and have come to  idea:

why here to create new token, I need getText() ??
May be I can just change token type as the following:

hex_string_literal
:s = HEX_NUMBER  { $s->setType( $s, CONST_STR_HEX ); }
;

And it seems this works fine

I have correct few rules in such way in the parser
But Tree Parser  still have for example this:

general_literal returns [ENode_Const_Ptr res]
: cd=CONST_DATE {res=make_enode_date ( GET_FBL_STRING( $cd.text) );}
| ct=CONST_TIME {res=make_enode_time ( GET_FBL_STRING( $ct.text) );}
| s=const_str   {res=make_enode_str  ( GET_FBL_STRING( $s.text ) );}
;

All these  $c.text  calls getText() -- this makes COPY of string buffer,
Then I convert into our own FBL_String...

PROBLEM 1:  this ANTLR STRINGs produced by get_Text()  never are reused as
I see.

PROBLEM 2:  related to speed also — how we can avoid here make copy of
string?
 in sources I see that exists code as

return ((pANTLR3_COMMON_TREE)(tree->super))->token->getText(
   ((pANTLR3_COMMON_TREE)(tree->super))->token);


May be something can be optimized/hacked here?
For example may be I can write own func, which check what token have
  char* or ANTLR_String, and choose way ...

But what syntax come to token in the .g?
I can do own macro of course ...
Just I want get some feedback if this can be a good idea for all?


=
And this is how I try reuse Lexer/Parser and NOT TreeParser.
All follow to your letter Jim:

void SqlParser_v3::ResuseParserObjects(
const char*inTextToParse,
vuint32inLength )
{
// ---
   // TREE PARSER cannot be reused. Destroy it.
   //
   if( mpTreeParser )
{
mpTreeParser->free( mpTreeParser );
mpTreeParser = NULL;
}

if( mpNodes )
{
mpNodes->free( mpNodes );
mpNodes = NULL;
}


// ---
   // Reuse other objects
   //
   mpInput->reuse(
mpInput,
(pANTLR

[il-antlr-interest: 34921] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Ruslan Zasukhin
On 11/16/11 6:00 PM, "Jim Idle"  wrote:

> [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks,
> oops.
> 
> Do not use the $text annotations if you want performance, they are purely
> for convenience ­ I must have said this 5000 times and I wish I had never
> added that bit ;) I also told you 3 or 4 times in various emails not to use
> it. I think that that is in the API docs somewhere, but I should make sure
> that it is, if it is not.

Right you told ...

But in docs, ANTLR books, examples, everywhere present this

hex_string_literal

:s = HEX_NUMBER  -> CONST_STR_HEX[$s.text->chars]

Yes, I have checked C API docs even today, but have found any special page,
which says

Java guys do this
C guys do this.


> There is no memory leak, but the auto string stuff does not release until
> you free the string factory, which only happens when you free the parser,
> not when you reuse it. Because it allocates small strings all the time, it
> kills performance, and then you will page.

Clear.

So when I "fix" all places with .text usage problem with memory should
disappear self.


> xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;

> I think that the field name is type but you get the idea.

Yes, I will try this asap and give feedback.
I have 40 such places in parser. And some number in the tree parser.


>  Don¹t use the
> fake object oriented stuff when you want performance, use the structs
> directly ­ you will find that it is many times faster than the v2 C++, not
> slower ­ this is C and you should get as close to the metal as you can.

I very hope :-)

If with PARSER I think I see how I can use this $s.type
I will check right now other 39 places in parser :)

=
It is not clear to me what we can do with Tree Parser ??

So I have some token, e.g. Date or time or other literal.
I make label, now I need get TEXT.

general_literal returns [ENode_Const_Ptr res]

: cd=CONST_DATE
{ res=make_enode_date ( GET_FBL_STRING($cd.text) );  }



So far I have found, that I can do something as

general_literal returns [ENode_Const_Ptr res]

: cd=CONST_DATE
  {
  pANTLR3_COMMON_TOKEN pToken = $cd->getToken( $cd );
  ANTLR3_MARKER pStart = pToken ->getStartIndex( pToken );
  ANTLR3_MARKER pEnd  = pToken->getStopIndex( pToken );
  Do some job ...
  }


Does such code in TreeParser looks correct for you?

Is it really safe and  getStartIndex / getStopIndex always return us correct
pointers?

Of course this can be extracted into special func to be used in many places
in one line of code ...

Just I believe there is no any example in C and any docs pages which discuss
this for TreeParser and C. If exists please point me by finger :-)


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34922] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Jim Idle
All your assumptions below are correct - the methods you are calling there
are public to grammar programmers for this reason. Just lose the $text and
have your own helper methods - for instance you only want the text when it
is time to actually do something with it, and not just to create a new
token that is the same text and position and so on. Your helper methods
can take a token, a start and stop token, a tree node with a payload, and
a tree node with a start and stop span. Even in Java you find that you
need these for good error reporting.

Sorry that the C runtime takes a lot more groking, but there isn't all
that object infrastructure to help you. I am still inclined to make a very
streamlined C runtime, that does not allow overrides of much at all, but
is very fast.

Jim


> -Original Message-
> From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
> boun...@antlr.org] On Behalf Of Ruslan Zasukhin
> Sent: Wednesday, November 16, 2011 8:36 AM
> To: antlr-interest@antlr.org
> Subject: Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20%
> than v2. With reuse() 2GB leaks, oops.
>
> On 11/16/11 6:00 PM, "Jim Idle"  wrote:
>
> > [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB
> > leaks, oops.
> >
> > Do not use the $text annotations if you want performance, they are
> > purely for convenience ­ I must have said this 5000 times and I wish
> I
> > had never added that bit ;) I also told you 3 or 4 times in various
> > emails not to use it. I think that that is in the API docs somewhere,
> > but I should make sure that it is, if it is not.
>
> Right you told ...
>
> But in docs, ANTLR books, examples, everywhere present this
>
> hex_string_literal
>
> :s = HEX_NUMBER  -> CONST_STR_HEX[$s.text->chars]
>
> Yes, I have checked C API docs even today, but have found any special
> page, which says
>
> Java guys do this
> C guys do this.
>
>
> > There is no memory leak, but the auto string stuff does not release
> > until you free the string factory, which only happens when you free
> > the parser, not when you reuse it. Because it allocates small strings
> > all the time, it kills performance, and then you will page.
>
> Clear.
>
> So when I "fix" all places with .text usage problem with memory should
> disappear self.
>
>
> > xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;
>
> > I think that the field name is type but you get the idea.
>
> Yes, I will try this asap and give feedback.
> I have 40 such places in parser. And some number in the tree parser.
>
>
> >  Donąt use the
> > fake object oriented stuff when you want performance, use the structs
> > directly ­ you will find that it is many times faster than the v2
> C++,
> > not slower ­ this is C and you should get as close to the metal as
> you can.
>
> I very hope :-)
>
> If with PARSER I think I see how I can use this $s.type I will check
> right now other 39 places in parser :)
>
> =
> It is not clear to me what we can do with Tree Parser ??
>
> So I have some token, e.g. Date or time or other literal.
> I make label, now I need get TEXT.
>
> general_literal returns [ENode_Const_Ptr res]
>
> : cd=CONST_DATE
> { res=make_enode_date ( GET_FBL_STRING($cd.text) );  }
>
>
>
> So far I have found, that I can do something as
>
> general_literal returns [ENode_Const_Ptr res]
>
> : cd=CONST_DATE
>   {
>   pANTLR3_COMMON_TOKEN pToken = $cd->getToken( $cd );
>   ANTLR3_MARKER pStart = pToken ->getStartIndex( pToken );
>   ANTLR3_MARKER pEnd  = pToken->getStopIndex( pToken );
>   Do some job ...
>   }
>
>
> Does such code in TreeParser looks correct for you?
>
> Is it really safe and  getStartIndex / getStopIndex always return us
> correct pointers?
>
> Of course this can be extracted into special func to be used in many
> places in one line of code ...
>
> Just I believe there is no any example in C and any docs pages which
> discuss this for TreeParser and C. If exists please point me by finger
> :-)
>
>
> --
> Best regards,
>
> Ruslan Zasukhin
> VP Engineering and New Technology
> Paradigma Software, Inc
>
> Valentina - Joining Worlds of Information http://www.paradigmasoft.com
>
> [I feel the need: the need for speed]
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34924] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Ruslan Zasukhin
On 11/16/11 6:00 PM, "Jim Idle"  wrote:

> xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;

Jim,

This gives error as
SqlParser_v3.g:879:21: cannot write to read only attribute: $u.type


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34925] Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20% than v2. With reuse() 2GB leaks, oops.

2011-11-16 Thread Jim Idle
Do it without the .type then, just use C code directly. Don't forget that
you can do things like:

{
  pANTLR3_BASE_TOKEN t;
  t = LT(-1);
  t->type = XXCX;
}

Or perhaps
 myHelper($s, MYTYPE);

Jim


> -Original Message-
> From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
> boun...@antlr.org] On Behalf Of Ruslan Zasukhin
> Sent: Wednesday, November 16, 2011 10:40 AM
> To: antlr-interest@antlr.org
> Subject: Re: [antlr-interest] [C] my v3 Parser no reuse() slower 20%
> than v2. With reuse() 2GB leaks, oops.
>
> On 11/16/11 6:00 PM, "Jim Idle"  wrote:
>
> > xxx: s=HEX_NUMBER { $s.type = CONST_STR_HEX; } ;
>
> Jim,
>
> This gives error as
> SqlParser_v3.g:879:21: cannot write to read only attribute: $u.type
>
>
> --
> Best regards,
>
> Ruslan Zasukhin
> VP Engineering and New Technology
> Paradigma Software, Inc
>
> Valentina - Joining Worlds of Information http://www.paradigmasoft.com
>
> [I feel the need: the need for speed]
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34926] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Jarrod Roberson
On Tue, Nov 15, 2011 at 5:46 PM, John B. Brodie  wrote:

> Greetings!
>
> I think you have issues with your function, number, and ATOM rules. see
> below...
>
>
apparently I did


> I have attached my complete, modified, grammar that successfully parses
> your input sample.
>
>
thanks for taking the time to fix up my problems, you figured out what I
intended when I couldn't!


> just a nit pick here - you really should include EOF in your topmost rule.
>
>
thanks I was not aware this was something I should do


> >
> > statement : expression
> >   | assignment
> >   ;
> >
> > assignment : ID '->' expression
> >| ATOM '->' ( string | number )
> >| function '->' statement ((','statement)=> ',' statement)* ;
>
> I think you are being too liberal here with your function signatures.
> you permit any expression to be a formal argument. are you intending to
> have patterns akin to either ML or Haskell? if not, change the
> definition of function in your assignment rule.
>
>
I am patterning my syntax off what I like about Erlang and Python with some
hopefully streamlining


> I also think that this permits multi-expression body, something like:
>
> foo(a,b)-> a, b.
>
>
I didn't realize it until you said it, but yes I only need to allow a
single expression as the LAST statement because I am having
the LAST statement result be the return value without need a "return"
keyword.


> e.g. a function body consisting of two (or more) expressions. do you
> really want that -- you do if your expressions can have side-effects.
>
>
nope single assignment variables and no side effects if I can help it


> maybe the third alt of assignment rule should be something like
> (assuming you do not have side effects and watch out for i/o!):
>
> | ID '(' ID (',' ID)* ')' '->' (assignment ',')* expression ;
>
> this eliminates the need for a predicate.
>
> >
> > args : expression (',' expression)*;
> >
> > function : ID '(' args ')' ;
> >
> > string : UNICODE_STRING;
> > number : HEX_NUMBER
> >| (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER
>
> I do not think you want to recognize floating point values in the
> parser. any tokens you send to the HIDDEN $channel (or skip();) will be
> silently accepted before and after the '.' of the float. change your
> INTEGER rule to this:
>
> fragment FLOAT: ;
> INTEGER : DIGIT+ ('.' DIGIT+ {$type=FLOAT;} )? ;
>

actually thanks to Bart I need the FLOAT rule as a parser rule with the
predicate because I want to be able to match

a = 1.
b = 100.1101.


> >
> > ATOM : (('A'..'Z'|'_')+)=> ('A'..'Z'|'0'..'9'|'_')+;
>
> no need for a predicate
>
> ATOM : ('A'..'Z')('A'..'Z'|'0'..'9'|'_')*;
>
> note that this also removes the ambiguity as to whether the string "_"
> is an ATOM or an ID.
>
>
this is what I actually intended, thanks


> >
> > ID : ('a'..'z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
> >
> > COMMENT : '/*' .* '*/' {$channel = HIDDEN;};
> >
>



-- 
Jarrod Roberson
678.551.2852

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34927] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Bart Kiers
Hi,

On Wed, Nov 16, 2011 at 8:21 PM, Jarrod Roberson wrote:

>
> actually thanks to Bart I need the FLOAT rule as a parser rule with the
> predicate because I want to be able to match


But John raises a valid point that I didn't mention: by "promoting" such a
rule to a parser rule, you run the risk that the parser matches a `number`
rule for the input source: "123   .   5" (spaces around the '.') because
the parser ignores the white spaces.

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34928] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Bart Kiers
On Wed, Nov 16, 2011 at 8:38 PM, Bart Kiers  wrote:

> Hi,
>
> On Wed, Nov 16, 2011 at 8:21 PM, Jarrod Roberson 
> wrote:
>
>>
>> actually thanks to Bart I need the FLOAT rule as a parser rule with the
>> predicate because I want to be able to match
>
>
> But John raises a valid point that I didn't mention: by "promoting" such a
> rule to a parser rule, you run the risk that the parser matches a `number`
> rule for the input source: "123   .   5" (spaces around the '.') because
> the parser ignores the white spaces.
>

Or even the input: "123 /* some comments */ . /* more comments */ 5" would
be a valid `number`... :)

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34929] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Jarrod Roberson
On Wed, Nov 16, 2011 at 2:39 PM, Bart Kiers  wrote:

> On Wed, Nov 16, 2011 at 8:38 PM, Bart Kiers  wrote:
>
>> Hi,
>>
>> On Wed, Nov 16, 2011 at 8:21 PM, Jarrod Roberson 
>> wrote:
>>
>>>
>>> actually thanks to Bart I need the FLOAT rule as a parser rule with the
>>> predicate because I want to be able to match
>>
>>
>> But John raises a valid point that I didn't mention: by "promoting" such
>> a rule to a parser rule, you run the risk that the parser matches a
>> `number` rule for the input source: "123   .   5" (spaces around the '.')
>> because the parser ignores the white spaces.
>>
>
> Or even the input: "123 /* some comments */ . /* more comments */ 5" would
> be a valid `number`... :)
>

Is there a way to support both

a -> 1.
b -> 1.1.

in a pure lexer rule then, I didn't think there was?

-- 
Jarrod Roberson
678.551.2852

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34930] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Bart Kiers
On Wed, Nov 16, 2011 at 8:45 PM, Jarrod Roberson wrote:

>
> > Or even the input: "123 /* some comments */ . /* more comments */ 5"
> would
> > be a valid `number`... :)
> >
>
> Is there a way to support both
>
> a -> 1.
> b -> 1.1.
>
> in a pure lexer rule then, I didn't think there was?
>
>
See my earlier reply: http://antlr.markmail.org/message/wtwq2vbmhedek2cn in
this thread.

"1." would become: INTEGER, DOT
"1.1." would become: FLOAT, DOT
"1 . 1" would become: INTEGER SPACE DOT SPACE INTEGER

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34931] Re: [antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

2011-11-16 Thread Jarrod Roberson
On Wed, Nov 16, 2011 at 2:55 PM, Bart Kiers  wrote:

> On Wed, Nov 16, 2011 at 8:45 PM, Jarrod Roberson 
> wrote:
>
>>
>> > Or even the input: "123 /* some comments */ . /* more comments */ 5"
>> would
>> > be a valid `number`... :)
>> >
>>
>> Is there a way to support both
>>
>> a -> 1.
>> b -> 1.1.
>>
>> in a pure lexer rule then, I didn't think there was?
>>
>>
> See my earlier reply: http://antlr.markmail.org/message/wtwq2vbmhedek2cn in
> this thread.
>

That message hadn't made it into my inbox yet.

Thanks that works after I fixed  the input.LA(2)<='9'

You guys help is dragging me along into the world of ANTLR!
-- 
Jarrod Roberson

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34933] Re: [antlr-interest] question on java.g 1.6

2011-11-16 Thread Eric
Hi Jeremy,

This is not an answer but my thoughts after reading it.

It looks like the first option originally had some other text with it that
was removed and the second option was valid as originally written.

If it was me, I would try and find earlier versions from either antlr 2.x
or something or see if something turns up at ANTLR repository on github.

Eric

On Wed, Nov 16, 2011 at 10:08 AM, Jeremy Long  wrote:

> I am still fairly new to antlr and when looking at the 1.6 Java grammar I
> noticed the following statement rule:
>
> statement
>:   block
>
>|   ('assert'
>)
>expression (':' expression)? ';'
>|   'assert'  expression (':' expression)? ';'
>|   'if' parExpression statement ('else' statement)?
>|   forstatement
>|   'while' parExpression statement
>|   'do' statement 'while' parExpression ';'
>|   trystatement
>|   'switch' parExpression '{' switchBlockStatementGroups '}'
>|   'synchronized' parExpression block
>|   'return' (expression )? ';'
>|   'throw' expression ';'
>|   'break'
>(IDENTIFIER
>)? ';'
>|   'continue'
>(IDENTIFIER
>)? ';'
>|   expression  ';'
>|   IDENTIFIER ':' statement
>|   ';'
>;
>
>
> My question is about the two lines for assert:
>|   ('assert'
>)
>expression (':' expression)? ';'
>|   'assert'  expression (':' expression)? ';'
> To me those look identicle - am I missing something? Is there some nuance
> to the parens that I don't understand?
>
> Thanks,
>
> jeremy
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34934] Re: [antlr-interest] This should be easy - but I can't figure it out

2011-11-16 Thread John B. Brodie
On 11/16/2011 12:02 AM, Voelkel, Andy wrote:
> 
> array: '[' FLOAT+ | STRING+ ']' -> ^(ARRAY FLOAT+ STRING+) ;
> 
> the separate lists on the right of the -> work because your syntax
> specifies separate lists.
> 
> [Andy - this approach doesn't work - I get exceptions thrown. I haven't 
> debugged that yet.]

yes. sorry. I got the cardinality on the right of the -> wrong. it
should be (i think):

array: '[' FLOAT+ | STRING+ ']' -> ^(ARRAY FLOAT* STRING* ) ;

because when specifying an array of floats, the list of FLOATs is full
the list of STRINGs is empty and so we must have STRING* on the right
hand side. similar reasoning for FLOAT*.

> 
> array: '[' (t+=FLOAT)+ | (t+=STRING)+ ']' -> ^(ARRAY $t+) ;
> 
> [Andy - that doesn't work either. I don't get exceptions, but I get errors 
> and non-sensical output]

not enough information here for me to be of any help to you, sorry. as
far as i recall this works in the Java target, maybe something is
different with the C# target. or more likely i have mis-remembered it...

> 
> array
> : ( l='[' (f+=FLOAT)+  ']' -> ^(ARRAY_FLOAT ["FLT ARY",$l] $f+) )
> | ( l='[' (s+=STRING)+ ']' -> ^(ARRAY_STRING["STR ARY",$l] $s+) )
>   ;
> 
> [Andy - This causes compiler errors, and I'm really not sure what you are 
> getting at.

compiler errors are odd here, maybe the C# target does not support the [
] notation for initializing an imaginary token?

> i think this last form will simplify subsequent processing of the tree.
> note also the proper initialization of the imaginary tokens.
> 
> [Andy - what do you mean "proper initialization of the imaginary tokens"]

the stuff between the [ ] on the right hand side of the -> is
information used to initialize the imaginary token. in Java, they get
translated into parameters to its constructor. i refer you to Dr. Parr's
book to find out more about this feature.



seems like i probably wasn't very helpful to you after all, sorry about
that...
   -jbb


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.