date:20111123

[il-antlr-interest: 35003] [antlr-interest] Eliminate characters in TOKEN

2011-11-23 Thread Rampon Jerome



Hi,


I tried to eliminate some space character in some complex tokens

but got issues with v3 (I think was ok with v2)


Following reduced rules are ok to retrieve my token ID

start
:  id=ID EOF
   { System.out.println("text is: " + $id.text); }
;

ID
:  ('a'..'z'|'A'..'Z')((' ')?('a'..'z'|'A'..'Z'|'0'..'1'))
;
but if I try to add ! after ' ' to eliminate it from returned token text

ID
:  ('a'..'z'|'A'..'Z')((' '!)?('a'..'z'|'A'..'Z'|'0'..'1'))
;
it complained on output option to be AST.
If I add it in my grammar options if complains and still return error
It seems it automatically adds if not there but later on still return error ???

Is that normal ?
Any simple way to bypass rather than a later replaceAll. I would prefer to keep
it target independent 


Thanks

Jerome

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35004] Re: [antlr-interest] Eliminate characters in TOKEN

2011-11-23 Thread Bart Kiers

Hi Rampon,


On Wed, Nov 23, 2011 at 10:54 AM, Rampon Jerome wrote:

> ...
> it complained on output option to be AST.
> If I add it in my grammar options if complains and still return error
> It seems it automatically adds if not there but later on still return
> error ???
>
> Is that normal ?
>

Yes, the `!` to exclude characters from lexer rules (as was possible in v2)
is no longer valid in v3 grammars.



> Any simple way to bypass rather than a later replaceAll. I would prefer to
> keep
> it target independent
>

No, that's not possible.

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35005] Re: [antlr-interest] Token and EOL

2011-11-23 Thread Sam Barnett-Cormack

A common way is making the lexer bundle up strings - this means that \n 
doesn't become a whitespace token on a hidden channel until after the 
string has finished.

Put another way, lexer rules see all characters, regardless of any other 
rules saying those get shunted to hidden channel. The hiding only takes 
effect for the parser, not the lexer.

Sam

On 23/11/2011 07:20, Borneq wrote:
> End of Line not ends mutiline comments but in Pascal string literal
> must end at line end. How do it? Strings and \n are in other channel.
> How define in grammar that string must ends at EOL?
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35006] Re: [antlr-interest] Eliminate characters in TOKEN

2011-11-23 Thread Ruslan Zasukhin

On 11/23/11 11:59 AM, "Bart Kiers"  wrote:

> Hi Rampon,
> 
> 
> On Wed, Nov 23, 2011 at 10:54 AM, Rampon Jerome wrote:
> 
>> ...
>> it complained on output option to be AST.
>> If I add it in my grammar options if complains and still return error
>> It seems it automatically adds if not there but later on still return
>> error ???
>> 
>> Is that normal ?
>> 
> 
> Yes, the `!` to exclude characters from lexer rules (as was possible in v2)
> is no longer valid in v3 grammars.

Yes, I also was in face to this change in v3.
This is examples from our Valentina SQL grammar where we use new trick to
avoid e.g. Wrapper quotes


//--

// String literals:

// caseSensitive = false, so we use only small chars.
fragment
Letter
:'a'..'z'
|   '@'
;


fragment
EscapeSequence
:'\\' ( QUOTE|'\\'|'b'|'t'|'n'|'f'|'r' )
;


STRING_LITERAL
@init
{
int escape_count = 0;
int theStart = $start;
}
:QUOTE

{ theStart = GETCHARINDEX(); } // skip first quote

(EscapeSequence{ ++escape_count; }
|QUOTE QUOTE   { ++escape_count; }
|~( QUOTE | '\\' )
)* 

{ 
$start = theStart;
EMIT();

// Optimization: lexer have found escaped chars, and we even
count them.
// We pass this info into parser/tree parser inside of
token,
// so later algorithms can avoid one more scan of literal to
check if 
// exists any symbols to unescape. Also knowing how much
such symbols
// Alg can do immediate return when all known escapes
resolved ...
// Also this can help accurately calculate RAM for unescaped
string.
//
LTOKEN->user1 = escape_count;
}

QUOTE // and skip last quote
;





//---
IDENT
:( Letter | '_' ) ( Letter | '_' | Digit )*
;


DELIMITED// delimited_identifier
@init
{
$type = IDENT;
int theStart = $start;
}
:
(DQUOTE{ theStart = GETCHARINDEX(); }
( ~(DQUOTE) | DQUOTE DQUOTE )+
{ $start = theStart; EMIT(); }
DQUOTE

|BQUOTE{ theStart = GETCHARINDEX(); }
( ~(BQUOTE) | BQUOTE BQUOTE )+
{ $start = theStart; EMIT(); }
BQUOTE

// valentina/oracle extension: [asasas '' " sd "]
|LBRACK{ theStart = GETCHARINDEX(); }
( ~(']') )+
{ $start = theStart; EMIT(); }
RBRACK
)
;




-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35008] [antlr-interest] Lexer error reporting

2011-11-23 Thread Bill Andersen

Hi Folks...

Been trying to figure out how to shut off default Lexer behavior to print 
messages to System.err, such as:

line 2:4 no viable alternative at character ' '

Instead, I'd like to catch these and do something with them.  Overriding 
reportError(RecognitionException) doesn't work and no other option seems 
obvious.

Doing this with a parser is easy - I just overrode emitErrorMessage in a custom 
subclass of Parser.  Got that done already, but can't seem to find out how to 
do the same in the lexer.

Any help appreciated.

.bill


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35009] Re: [antlr-interest] Lexer error reporting

2011-11-23 Thread Bart Kiers

Hi Bill,


On Wed, Nov 23, 2011 at 5:41 PM, Bill Andersen wrote:

> Hi Folks...
>
> Been trying to figure out how to shut off default Lexer behavior to print
> messages to System.err, such as:
>
>line 2:4 no viable alternative at character ' '
>
> Instead, I'd like to catch these and do something with them.  Overriding
> reportError(RecognitionException) doesn't work and no other option seems
> obvious.


Both the lexer and parser have a `reportError(...)` method, and my guess is
that you did something like this:

@members {
  @Override
  public void reportError(RecognitionException e) ...
}

which is a short-hand for:

@parser::members { // note the `parser::`
  @Override
  public void reportError(RecognitionException e) ...
}

But since a "no viable alternative" error is something that comes from the
lexer, you need to explicitly override the lexer method like this:

@lexer::members {
  @Override
  public void reportError(RecognitionException e) {
System.out.println("CUSTOM ERROR...");
  }
}

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35011] [antlr-interest] [job] Small paid work on Grammar

2011-11-23 Thread Adam Retter

Hello, I dont know Antlr but I need some small changes made to an
existing Antlr grammar.

I would take the time to learn it myself, but I need these changes
immediately and my initial experimentations have been fruit-less.
Whilst I dont understand Antlr syntax I do understand some things
about grammars, and I believe that these changes should be relatively
trivial. Just one or two hours I would imagine. I realise no one will
want to do the work for me, so I am prepared to pay for the work, but
this is coming from my own pocket and so there is a limit.

If anyone is interested, please get back to me with cost.

Details -  Basically, there is an XQuery 1.0 parser written in Antlr
here - 
http://exist.svn.sourceforge.net/viewvc/exist/trunk/eXist/src/org/exist/xquery/parser/
See XQuery.g for a brief overview in the comments at the top.

I basically just need to add Annotations from the XQuery 3.0
specification to our existing XQuery 1.0 parser -
http://www.w3.org/TR/xquery-30/#id-annotations
http://www.w3.org/TR/xquery-30/#id-grammar

This is just for the purpose of prototyping, so it does not matter
that the end result will be XQuery 1.0 parser with support for XQuery
3.0 annotations.

Thanks

Adam

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 35003] [antlr-interest] Eliminate characters in TOKEN

[il-antlr-interest: 35004] Re: [antlr-interest] Eliminate characters in TOKEN

[il-antlr-interest: 35005] Re: [antlr-interest] Token and EOL

[il-antlr-interest: 35006] Re: [antlr-interest] Eliminate characters in TOKEN

[il-antlr-interest: 35008] [antlr-interest] Lexer error reporting

[il-antlr-interest: 35009] Re: [antlr-interest] Lexer error reporting

[il-antlr-interest: 35011] [antlr-interest] [job] Small paid work on Grammar

7 matches

Site Navigation

Mail list logo

Footer information