[il-antlr-interest: 34812] Re: [antlr-interest] file size

2011-11-08 Thread A Z
  It depends on the grammar. The combined count for all my ANTLR-generated
C code is >30 lines.



On Tue, Nov 8, 2011 at 12:38 PM, yushang  wrote:

> Sounds sad :( it seems there are lots of work to do .
>
> 2011/11/9 Justin Murray 
>
> > That sounds very large to me. I have a parser that generates 97940 lines
> > of C, and this is for a terribly ambiguous language with lots of
> > backtracking. I would recommend doing some left-factoring and maybe add
> > some predicates to trim that down a bit.
> >
> > On 11/8/2011 1:22 PM, yushang wrote:
> > > Hi everyone,
> > > Do you think a parser file(C file) with 408284 lines a little big ?
> > > What is the biggest parser you've ever seen ? Maybe I need to optimize
> my
> > > grammar ?
> > > Many thanks
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34762] [antlr-interest] C Target: Stopping parse on first error

2011-11-06 Thread A Z
In the C target is there a way to arbitrarily return from the parser? For
instance, if an include file is not found ANTLR never sees an exception but
I would like to stop parsing that file immediately. Could I possibly insert
an EOF token before continuing with the parse?

Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34661] [antlr-interest] v4 Token size

2011-10-29 Thread A Z
Hello,

  I saw some earlier messages from Sam regarding a much smaller token he was
developing for C#. Is ANTLR v4 going have smaller tokens in the C target, or
perhaps use C++? The tool works great but I had to remove many of the
function pointers from commonToken to reduce memory usage.

Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 34208] [antlr-interest] C-target 3.2 : getParent returns invalid pointer

2011-09-28 Thread A Z
Hello All,

I'm trying to run a tree parser on part of a tree and then delete that
sub-tree but I can't get the parent of the node that needs to be deleted.

//For each i in pPackageList
pANTLR3_BASE_TREE thisNode = pPackageList->at(i);

pANTLR3_BASE_TREE parentNode = thisNode->getParent(thisNode);

unsigned int thisIndex = thisNode->getChildIndex(thisNode);

runPackageTree(thisNode);

parentNode->deleteChild(parentNode,thisIndex);

After calling getParent, parentNode is 0x38 which isn't a valid pointer. The
parent for thisNode should be a NilNode so I'm wondering if that is causing
a problem. I tried using deleteChild() on the node returned from the parser
rule but it still segfaults. Is there something special needed to remove a
child from the root node?

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 33348] Re: [antlr-interest] How to make the lexer thread-safe (C target)?

2011-07-26 Thread A Z
I think you can do this using the context block.

//grammar.g
@lexer::context {
  lexerData * pLexerData;
  ANTLR3_UINT32 defaultChannel;
}

//grammar.h
struct grammar_Ctx_struct
{

...
//Function pointers
...

  lexerData * pLexerData;
  ANTLR3_UINT32 defaultChannel;

};


On Tue, Jul 26, 2011 at 9:04 AM, Gokulakannan Somasundaram <
gokul...@gmail.com> wrote:

> Jim,
>   Have you thought about providing a way to include variables in the
> Lexer/Parser structure that gets created? Currently the Lexer/Parser
> structure that gets created because of the grammar only stores the function
> pointers.
>
> Thanks,
> Gokul.
>
> On Tue, Jul 26, 2011 at 9:51 PM, Gokulakannan Somasundaram <
> gokul...@gmail.com> wrote:
>
> > I had a similar requirement. I maintained a class LexerContext in a
> Thread
> > local variable and accessed it and modified it. There might be better
> > solutions.
> >
> > Thanks,
> > Gokul.
> >
> >
> > On Tue, Jul 26, 2011 at 8:07 PM, Mu Qiao  wrote:
> >
> >> Hello,
> >>
> >> My lexer has to rely on some internal status like the following:
> >>
> >> DQUOTE  :   '"' { if(LA(-1) != '\\') double_quoted = !double_quoted;
> >> };
> >> SQUOTE  :   { double_quoted }? => '\'';
> >> SINGLE_QUOTED_STRING_TOKEN  :   { !double_quoted }? => '\'' .* '\'';
> >>
> >> "double_quoted" is a bool variable declared in @member section. The
> >> generated code will declare it in global scope, which is not thread
> >> safe. I wonder if there is any way to make the lexer thread-safe? For
> >> example declare the variable in xxxLexer_Ctx_struct.
> >>
> >> --
> >> Best wishes,
> >> Mu Qiao
> >> GnuPG fingerprint: 92B1 B0C4 8D14 F8C4 EFA5  3ACC 30B3 0DE4 17B1 57E9
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> >
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32925] Re: [antlr-interest] ANTLR 3.4 rc3 (aka beta3)

2011-06-24 Thread A Z
g++ and Clang++ now both compile all my grammars built with 3.4.

I'm not sure how to build the runtime library, and tell g++ to use it.

Ad.


On Thu, Jun 23, 2011 at 6:03 PM, Terence Parr  wrote:

> hi, Jim fixed up the C runtime. please test it out:
>
> http://antlr.org/download/antlr-master-3.4-beta3-completejar.jar
> http://antlr.org/download/antlr-master-3.4-beta3-src.jar
> http://antlr.org/download/antlr-runtime-3.4-beta3-sources.jar
> http://antlr.org/download/antlr-runtime-3.4-beta3.jar
>
> Ter
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32853] Re: [antlr-interest] test release of antlr 3.4

2011-06-20 Thread A Z
Thanks, that Java command worked for me.

The good news is the C code was generated with no grammar issues.
Unfortunately the C code doesn't compile:

- The characters '\>=' are used in comparisons
- '_empty' is not defined
- There are lots of these, with varying numbers of altSwitchCase(), which is
also not defined.
switch (alt48)
{

altSwitchCase(i,a)altSwitchCase(i,a)altSwitchCase(i,a)altSwitchCase(i,a);
}
- Many functions starting with 'FOLLOW_set_in_*' are not defined. These seem
to be related to rules in which all the alternatives are single tokens.



On Mon, Jun 20, 2011 at 11:28 AM, Julien BLACHE  wrote:

> A Z  wrote:
>
> Hi,
>
> > How is this jar different than 3.2? I tried the same command
> >
> >>java -jar antlr-3.4.jar grammar.g
> >
> > but I get an error message:
> > "Invalid or corrupt jarfile antlr-3.4.jar"
>
> Same issue here. It works when invoked this way
>  java -cp antlr-3.4.jar org.antlr.Tool grammar.g
>
> I'll leave it up to the Java-literate to investigate/explain/fix ;)
>
> (Sun^WOracle Java 1.6.0_26 if it makes any difference)
>
> JB.
>
> --
> Julien BLACHE   <http://www.jblache.org>
>   GPG KeyID 0xF5D65169
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32842] Re: [antlr-interest] test release of antlr 3.4

2011-06-20 Thread A Z
How is this jar different than 3.2? I tried the same command

>java -jar antlr-3.4.jar grammar.g

but I get an error message:

"Invalid or corrupt jarfile antlr-3.4.jar"



On Mon, Jun 20, 2011 at 5:55 AM, Julien BLACHE  wrote:

> Hi,
>
> > The other thing is that the C target has not been tested at all
> > really. If someone could report back on how it works, that would be
> > great ( including the debugging socket protocol to ANTLRWorks).
>
> I gave it a very quick try on my grammars (written for 3.2) and it
> doesn't look good :/ I've been following the list for some time but I
> can't remember what, if any, adaptations are needed for 3.4, so I'll
> check back on that (pointers appreciated if you have some handy).
>
> Where can I find the C runtime for 3.4? The download directory still has
> an old 3.3 snapshot.
>
> Right now the code generated by antlr is incomplete; generated C files
> are much shorter than those generated by 3.2 and actions are missing
> entirely in my AST grammar.
>
> Quick comparison, line counts:
>ANTLR 3.2ANTLR 3.4
> DAAP2SQL.c927  585
> DAAPLexer.c   1092 934
> DAAPParser.c  1014 901
>
> A showstopper for C code is that -depends is broken; the only output is:
>  /dependencies()
>
> There also seems to be an issue wrt locale settings; unless running with
> LC_ALL=C, antlr bails out on me looking for
> org/antlr/tool/templates/messages/languages/fr.stg.
>
> Cosmetic issue, but still, -version returns:
>  ANTLR Parser Generator  Version ${project.version} ${buildNumber}
>
> JB.
>
> --
> Julien BLACHE   
>   GPG KeyID 0xF5D65169
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32790] [antlr-interest] C Target - Where is it safe to switch input streams?

2011-06-15 Thread A Z
Hello all,

  In the example C grammar, PUSHSTREAM() is called in the middle of a lexer
rule to implement the include preprocessor directive. I followed this
approach and everything works fine except the current token is incomplete.
The channel and USER values get set but the text is blank. All the tokens
from the included file are correct.

My  __LINE__ directives work the same way as includes and look something
like this:

DIR_LINE :
  '`__LINE__'
{ctx->pLexerData->dirLineText();runNewBuffer(ctx);$channel=HIDDEN;};

static void runNewBuffer(pSVLexer ctx)
{
pANTLR3_INPUT_STREAM input =
antlr3NewAsciiStringInPlaceStream(newBufferData,newBufferSize,newBufferName);
if(input == NULL)
   ANTLR3_FPRINTF(stderr, "Unable to open buffer \%s due to malloc()
failure1\n",newBufferName);

PUSHSTREAM(input);

ctx->pLexerData->inputList.push(input); //Mirror what ANTLR does
}

I found if I remove runNewBuffer() then the DIR_LINE token correctly gets
it's text set to "`__LINE__", otherwise the text is blank but the channel is
set correctly in either case. While trying to fix this I found it crashes if
PUSHSTREAM is placed after emit() inside nextTokenStr(). If I move it to the
nextToken() function, it seems to work as intended. Are there any side
effects to switching the input streams inside nextToken()? Is there another
way to run an action after a certain token has been completed?

Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32568] Re: [antlr-interest] Why stream name can't be printed out when error occurs(ANTLR C)?

2011-05-26 Thread A Z
For development, I handled tree grammar error reporting with the following
code. It simply searches back in the token stream until it hits an input
token and uses that for location information. I also print out the token
index offset between the error token and the reporting token so I get an
idea of where the error occurred. Note that I stopped changing this as soon
it was working enough to allow debugging the tree grammar so don't expect
much.


  pANTLR3_COMMON_TOKEN searchToken = NULL;
  unsigned int testIndex = 0;
  if(ex->index > 0)
  {
testIndex = ex->index - 1;
pANTLR3_BASE_TREE searchBaseTree =
thisTreeNodeStream->get(thisTreeNodeStream,testIndex);
searchToken = searchBaseTree->getToken(searchBaseTree);


//If UP token
if(errorToken->type == 3)
{
  //Go back in the stream until we hit a token that is not UP and has a
column not -1
  while(searchToken->type == 3 || searchToken->charPosition == -1)
  {
testIndex--; //FIXME - Maybe dangerous?
searchBaseTree =
thisTreeNodeStream->get(thisTreeNodeStream,testIndex);
searchToken = searchBaseTree->getToken(searchBaseTree);
  //printf("searchToken->toString
%s\n",searchToken->toString(searchToken)->chars);
  }

}
//If DOWN token
else if(errorToken->type == 2)
{
  //Go back in the stream until we hit a token that is not UP
  while(searchToken->type == 2)
  {
testIndex--; //FIXME - Maybe dangerous?
searchBaseTree =
thisTreeNodeStream->get(thisTreeNodeStream,testIndex);
searchToken = searchBaseTree->getToken(searchBaseTree);
 // printf("searchToken->toString
%s\n",searchToken->toString(searchToken)->chars);
  }
}
//If no column info(imaginary token?)
else if(errorToken->charPosition == -1)
{
  //Go back in the stream until we hit a token that is not UP and has a
column not -1
  while(searchToken->type == 3 || searchToken->type == 2 ||
searchToken->charPosition == -1)
  {
testIndex--; //FIXME - Maybe dangerous?
searchBaseTree =
thisTreeNodeStream->get(thisTreeNodeStream,testIndex);
searchToken = searchBaseTree->getToken(searchBaseTree);
  //printf("searchToken->user1 %d\n",searchToken->user1);
 // printf("searchToken->type %d\n",searchToken->type);
 // printf("searchToken->toString
%s\n",searchToken->toString(searchToken)->chars);
  }
}
else
{
  searchToken = errorToken;
}
  }
  //This must be the first node in the tree
  else
  {
searchToken = errorToken;
  }

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32424] [antlr-interest] C Target - Assigning rule info to imaginary tokens

2011-05-10 Thread A Z
Hello all,

  I have a case where I need to assign an imaginary token the attributes of
a token inside a rule. I tried the below but as expected it does not have
the desired effect.


var_or_function :
  identifier
  (
LPARAN arg_list RPARAN
  -> ^(I_FUNC[identifier] arg_list)
  |
  -> I_UNKN[identifier]
  );

identifier :
SIMPLE_IDENT
  | ESCAPED_IDENT;


Is there any way to do this without merging the two lexer rules into one
token?

Thanks.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32344] Re: [antlr-interest] "Balanced Matching" and ANTLR

2011-04-29 Thread A Z
Is this what you mean?

module_body :
  (~(K_MODULE | K_ENDMODULE) | (K_MODULE module_body K_ENDMODULE))+;

module_declaration :
  key=K_MODULE lifetime? var=identifier module_body post=K_ENDMODULE (COLON
pident=identifier)?;

In the code below, the rule module_declaration will match only the outermost
module 'outside'.

module outside;

  module inside1;
  endmodule : inside1

  module inside2;
  endmodule

endmodule




On Thu, Apr 28, 2011 at 1:03 PM, Udo Weik  wrote:

> Hello,
>
> I'm just looking for common approaches for "Balanced Matching" with ANTLR,
> e. g. the typical problem for extracting methods/functions/procedures
> in source files. Any hints are highly appreciated.
>
>
> Many thanks and greetings
> Udo
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32204] Re: [antlr-interest] C target - Disabling lexer output for groups of tokens

2011-04-14 Thread A Z
Thanks for the response.  I couldn't find where nextToken is set but I see
it now in antlr3lexer.c

antlr3LexerNew(ANTLR3_UINT32 sizeHint, pANTLR3_RECOGNIZER_SHARED_STATE
state)
{
...
/* Install the default nextToken() method, which may be overridden
 * by generated code, or by anything else in fact.
 */
lexer->rec->state->tokSource->nextToken=  nextToken;
...
}

On Thu, Apr 14, 2011 at 10:41 AM, Jim Idle  wrote:

> Implement your own next token method. There are two functions, this one
> and nextToken - copy them and make the change then install your pointer
> before calling the lexer.
>
> Jim
>
> > -Original Message-
> > From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
> > boun...@antlr.org] On Behalf Of A Z
> > Sent: Thursday, April 14, 2011 8:03 AM
> > To: antlr-interest@antlr.org
> > Subject: [antlr-interest] C target - Disabling lexer output for groups
> > of tokens
> >
> > Is there way to persistantly switch the lexer output on or off? Channel
> > assignments only last for one token. I know I can use skip() but then I
> > have to add the same code to every lexer rule(I have hundreds) where it
> > checks a boolean and then executes skip(). I also looked at changing
> > the default channel but I don't how that can be done as the following
> > isn't a function pointer that can be reassigned:
> >
> > ANTLR3_INLINE static pANTLR3_COMMON_TOKEN
> > nextTokenStr(pANTLR3_TOKEN_SOURCE toksource)
> > {
> > ...
> > lexer->rec->state->channel=
> > ANTLR3_TOKEN_DEFAULT_CHANNEL;
> > ...
> > }
> >
> > Is there another way of doing this?
> >
> > Thanks
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32197] [antlr-interest] C target - Disabling lexer output for groups of tokens

2011-04-14 Thread A Z
Is there way to persistantly switch the lexer output on or off? Channel
assignments only last for one token. I know I can use skip() but then I have
to add the same code to every lexer rule(I have hundreds) where it checks a
boolean and then executes skip(). I also looked at changing the default
channel but I don't how that can be done as the following isn't a function
pointer that can be reassigned:

ANTLR3_INLINE static pANTLR3_COMMON_TOKEN
nextTokenStr(pANTLR3_TOKEN_SOURCE toksource)
{
...
lexer->rec->state->channel=
ANTLR3_TOKEN_DEFAULT_CHANNEL;
...
}

Is there another way of doing this?

Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32133] [antlr-interest] Preventing longest match in the lexer

2011-04-08 Thread A Z
Hello all,

Is there a way to force 'match first' among a group of tokens? In the code
below, if 'undef(' or 'undef ' is found, it matches DIR_MACRO regardless of
the predicate. I can see why it would do this, but I'm trying to find a way
to match the DIR_UNDEF rule without resorting to combining the two rules and
manually modify the token type.


DIR_UNDEF :
  '`undef'
  SLSpace+ var0=SimpleIdent;

DIR_MACRO :
  '`' var0=SimpleIdent
  (
{cond1(var0) == true}? =>
  | {cond2(var0) == true}? => Args
  | //Both conditionals false
  );

fragment Args : ' '* '(' ;

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 32079] Re: [antlr-interest] Q: how to incorporate a preprocessor in the flow?

2011-04-04 Thread A Z
I tried that approach when I first started with ANTLR but had difficulty
handling arbitrary token rearrangement. Early on I couldn't figure out how
to backtrack in the token stream in order to detect identifier construction
using macros. Something like the following requires that 'prefix' be lexed
again after macro substitution in order to detect if the string from suffix
and 'prefix' will be merged into one identifier.

define suffix(name) name
prefix`suffix

We use this often in RTL for bus port lists. Even though the spec seems to
explicitly disallow this, Modelsim and DC will accept it. Lexing twice
solves this case easily but now the tokens point to a non-existent source.


On Mon, Apr 4, 2011 at 8:59 PM, Martin d'Anjou  wrote:

> Hi,
>
> Thanks to both of you for sharing your approaches. Right now I am pondering
> how to alter the sequence of tokens before they hit the parser. Intuitively
> I want to have three processing units (lexer, pre-processor, parser)
> connected together through io pipes of tokens (e.g. token fifos), but this
> is not how ANTLR was architected (it's how I would have done it in hardware
> though!).
>
> Martin
>
>
>
> On 11-04-04 09:25 AM, Sam Harwell wrote:
>
>> I used a hand-crafted implementation of TokenSource between the lexer and
>> parser. In the preprocessor, whenever I manipulated a token I used a new
>> token class derived from CommonToken (call it SubstitutedToken) which
>> contained a linked list leading from the effective position in the stream
>> (stored in CommonToken) all the way back to the original location (file
>> and
>> position) of the token definition. When a CommonToken substitution occurs,
>> the linked list has one node containing the original source position where
>> defined. Whenever a SubstitutedToken substitution occurs, a new node for
>> the
>> token's previous effective position is added to the linked list and that
>> new
>> head pointer is stored in the new token.
>>
>> `define x 3
>> `define y `x
>> `y
>>
>> In this case, token `y is eventually replaced with a SubstitutedToken
>> which
>> appears at (line 2, column 1, length 1, text "3") containing the following
>> linked list:
>>
>> Line 3, column 1, length 2 (list head, the location where `y was
>> substituted
>> with `x)
>> Line 2, column 11, length 2 (the location where `x was substituted with
>> '3')
>> Line 1, column 11, length 1 (the actual source location where the token
>> '3'
>> is defined)
>>
>> This list allows true relative ordering of all tokens in the processed
>> source: when two tokens appear to be at the same location in the
>> preprocessed stream, you simply compare the positions of the first node in
>> the position list.
>>
>> Sam
>>
>> -Original Message-
>> From: antlr-interest-boun...@antlr.org
>> [mailto:antlr-interest-boun...@antlr.org] On Behalf Of A Z
>> Sent: Monday, April 04, 2011 12:13 AM
>> To: Martin d'Anjou
>> Cc: antlr-interest@antlr.org
>> Subject: Re: [antlr-interest] Q: how to incorporate a preprocessor in the
>> flow?
>>
>> Hi Martin,
>>
>>   I just completed an SV preprocessor which can parse UVM 1.0
>> successfully.
>> After 2 revisions I settled on a completely separate preprocessor(lexer
>> and
>> parser). As you saw, you need to tokenize the macro_text in order to
>> easily
>> support macros with arguments and detect the three escaped tokens `", `\`"
>> and ``. I'm not sure how well a lexer only approach could handle cases
>> where
>> a macro substitution can merge text with a previously lexed token. The
>> separate approach still has flaws, such as good error reporting. Of course
>> I
>> could be missing an obvious easy solution.
>>
>>
>>
>> On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou  wrote:
>>
>>  Hello,
>>>
>>> I am trying to find a way to incorporate a preprocessor in the ANTLR
>>> flow. I thought of doing this before the lexer, but I need to tokenize
>>> the incoming char stream for macro substitution to be easy. I thought
>>> of doing it between the lexer and the parser, and replace the
>>> preprocessor tokens with their expansion before feeding the token
>>> stream to the parser, so I guess I would end up using something like
>>> the TokenRewriteStream??? Can someone steer me in the right direction
>>> please? Or should I be using lexer rule actions? In which case, any
>>> example on how to access 

[il-antlr-interest: 32062] Re: [antlr-interest] Q: how to incorporate a preprocessor in the flow?

2011-04-03 Thread A Z
Hi Martin,

  I just completed an SV preprocessor which can parse UVM 1.0 successfully.
After 2 revisions I settled on a completely separate preprocessor(lexer and
parser). As you saw, you need to tokenize the macro_text in order to easily
support macros with arguments and detect the three escaped tokens `", `\`"
and ``. I'm not sure how well a lexer only approach could handle cases where
a macro substitution can merge text with a previously lexed token. The
separate approach still has flaws, such as good error reporting. Of course I
could be missing an obvious easy solution.



On Sun, Apr 3, 2011 at 9:51 PM, Martin d'Anjou  wrote:

> Hello,
>
> I am trying to find a way to incorporate a preprocessor in the ANTLR
> flow. I thought of doing this before the lexer, but I need to tokenize
> the incoming char stream for macro substitution to be easy. I thought of
> doing it between the lexer and the parser, and replace the preprocessor
> tokens with their expansion before feeding the token stream to the
> parser, so I guess I would end up using something like the
> TokenRewriteStream??? Can someone steer me in the right direction
> please? Or should I be using lexer rule actions? In which case, any
> example on how to access the token stream of the replacement token list
> of an identifier? Too many questions sorry.
>
> The language I am hoping to tokenize is SystemVerilog and has C-like
> preprocessor macros (`include, `ifdef, `define NAME(params,...), token
> concatenation, etc.).
>
> Regards,
> Martin
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31306] Re: [antlr-interest] missing getTokenType(string) in ANTLR3C?

2011-02-01 Thread A Z
Hi Bastian,

  You may find these scripts useful. One creates a character array of all
the tokens, so you can index it with the token #define value to get the
string name. The second one post-processes the parser .c file to add a
rudimentary stack trace.  As Jim pointed out, you will need to move all
character strings from the parser rules to lexer rules or tokens in order to
get useful token names.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


insTree.pl
Description: Perl program


parseTok.pl
Description: Perl program
-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31227] [antlr-interest] ANTLR language patterns for NEdit

2011-01-24 Thread A Z
Although there's many cases it doesn't handle correctly, some of you may
find this useful.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


antlr.pats
Description: Binary data
-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 31200] [antlr-interest] ANTLR3C Single token stream with sequential parsers

2011-01-18 Thread A Z
Hello,

  Is there a way to reset the token stream without running the lexer again?
I have a case where I need to run two different parsers sequentially on the
same token stream. I tried rewind() on the input stream but this has side
effects(hidden tokens are not hidden).

Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30590] Re: [antlr-interest] C target character position

2010-11-20 Thread A Z
Thanks for the quick response.  There was a bug in my printf
statements causing the pointer addresses to be incorrect. I was fairly
certain they worked as you described but I wanted to be sure.



On 11/19/10, Jim Idle  wrote:
> The very first token gives you a =1 for the char position in line I am
> afraid, I need to work around that I think, but the indexes are pointers in
> to memory (your input) and not 0, 1, 2 etc. Note that the token also
> remembers that start of the line that it is located on.
>
> If the start of the first token is not the start of your data, then perhaps
> there are comments and newline tokens that are skipped before the first
> token that the parser sees? If this did not work, there would be a lot of
> broken parsers out there.
>
> So, use the pointer to get the start, subtract it from the end pointer to
> get the length and print out that many characters, which will show you what
> the token matched. The line start is updated when a '\n' is seen by the
> parser, but you can change the character. This is useful for error messages
> when you want to print the text line that an error occurs in.
>
> The offset of the token is the start point minus the input start (use the
> address you pass in (databuffer) and not input->data), however, the pointer
> is pointing directly at that anyway. I think that you are forgetting that
> the token stream does not return off channel tokens or SKIP()ed tokens.
>
> Jim
>
>
>
>> -Original Message-
>> From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
>> boun...@antlr.org] On Behalf Of A Z
>> Sent: Friday, November 19, 2010 4:44 AM
>> To: antlr-interest@antlr.org
>> Subject: [antlr-interest] C target character position
>>
>> Hello,
>>
>>   I'm trying to record the offset of the start of a token, relative to
>> the beginning of the input buffer. My program passes a (char *) buffer
>> to ANTLR and then runs a simple grammar that builds a data structure
>> containing the element types and pointer to their position in the text
>> buffer. The problem is I can't find a way to get the true character
>> offset from ANTLR in order to set the pointer. Below it prints out the
>> results of most of the values for the ANTLR3_COMMON_TOKEN for the very
>> first token. The two subsequent values are the data member and the
>> address of the character buffer. I would expect start, getStartIndex
>> and input->data to be the same but they are different. How can I find
>> the offset of a token, in terms of the number of characters from the
>> start of the stream?
>>
>> Thanks
>>
>> charPosition  : -1
>> getCharPositionInLine : -1
>> getLine   : 1
>> getStartIndex : 23213648
>> getStopIndex  : 23213653
>> getTokenIndex : 0
>> index : 0
>> line  : 1
>> lineStart : 23213648
>> start : 23213648
>> stop  : 23213653
>>
>> (pANTLR3_INPUT_STREAM)input->data 23217928
>> (uint8_t*)dataBuffer  23213624
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30583] [antlr-interest] C target character position

2010-11-19 Thread A Z
Hello,

  I'm trying to record the offset of the start of a token, relative to the
beginning of the input buffer. My program passes a (char *) buffer to ANTLR
and then runs a simple grammar that builds a data structure containing the
element types and pointer to their position in the text buffer. The problem
is I can't find a way to get the true character offset from ANTLR in order
to set the pointer. Below it prints out the results of most of the values
for the ANTLR3_COMMON_TOKEN for the very first token. The two subsequent
values are the data member and the address of the character buffer. I would
expect start, getStartIndex and input->data to be the same but they are
different. How can I find the offset of a token, in terms of the number of
characters from the start of the stream?

Thanks

charPosition  : -1
getCharPositionInLine : -1
getLine   : 1
getStartIndex : 23213648
getStopIndex  : 23213653
getTokenIndex : 0
index : 0
line  : 1
lineStart : 23213648
start : 23213648
stop  : 23213653

(pANTLR3_INPUT_STREAM)input->data 23217928
(uint8_t*)dataBuffer  23213624

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30352] [antlr-interest] Semantic predicate behaviour with k>1

2010-10-15 Thread A Z
Hello,

  I am seeing ANTLR generate unexpected code when using semantic predicates
and am wondering if my grammar or understanding is incorrect. The EBNF has a
rule similar to the following:

rule :
primary_literal
  | {isIdent(LT(1)->getText(LT(1)),PARAM_IDENT)}? identifier LBRACKET?
  | {isIdent(LT(1)->getText(LT(1)),SPECPARAM_IDENT)}? identifier (LBRACKET
constant_range_expression RBRACKET)?
  | {isIdent(LT(1)->getText(LT(1)),TYPE_IDENT)}?  identifier APOSTROPHE
  | {isIdent(LT(1)->getText(LT(1)),ENUM_IDENT)}?  identifier
  | {isIdent(LT(1)->getText(LT(1)),GENVAR_IDENT)}?identifier
  | {isIdent(LT(1)->getText(LT(1)),LET_IDENT)}?   identifier LPARAN?
  | {isIdent(LT(1)->getText(LT(1)),GENBLOCK_IDENT)}?  identifier (LBRACKET
constant_expression RBRACKET)? PERIOD
  | {isIdent(LT(1)->getText(LT(1)),PACKAGE_IDENT)}?   identifier COLONCOLON
constant_primary_package_scope_suffix
  | identifier ((LPARAN list_of_arguments RPARAN)=> LPARAN list_of_arguments
RPARAN)?// tf_call

The last identifier type can be forward declared so that type is assumed if
the identifier at this point is undefined. I previously had tried doing this
by factoring but it makes the grammar very difficult to follow and
substantially increases the number of rules.  With this rule ANTLR generates
the following:

else if ( (LA1039_0 == SIMPLE_IDENT) )
{

{
int LA1039_2 = LA(2);
if ( (LA1039_2 == LBRACKET || LA1039_2 == PERIOD) )
{
alt1039=8;
}
else if ( (LA1039_2 == APOSTROPHE) )
{
alt1039=4;
}
else if ( (LA1039_2 == COLONCOLON) )
{
alt1039=9;
}
else if (
((isIdent(LT(1)->getText(LT(1)),PARAM_IDENT))) )
{
alt1039=2;
}
else if (
((isIdent(LT(1)->getText(LT(1)),SPECPARAM_IDENT))) )
{
alt1039=3;
}
else if (
((isIdent(LT(1)->getText(LT(1)),ENUM_IDENT))) )
{
alt1039=5;
}
else if (
((isIdent(LT(1)->getText(LT(1)),GENVAR_IDENT))) )
{
alt1039=6;
}
else if (
((isIdent(LT(1)->getText(LT(1)),LET_IDENT))) )

The first 3 conditions look out of place. It appears even with predicates,
ANTLR will increase k if it thinks it can help resolve ambiguities. Chapter
13 in the book doesn't appear to describe cases like this. The first case
won't work as three different alternatives match this sequence. If I force
k=1 for this rule, then the code is generated as expected. Strangely,
removing the PERIOD from the GENBLOCK subrule also works, but breaks the
grammar. Is this expected behaviour?

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30311] Re: [antlr-interest] Lexer errors when looking for wrong token

2010-10-11 Thread A Z
Thanks for the responses.

Kevin,

  Yes, that helps. I'm using the C target so I haven't been able to actually
test it but I see the logic.

Joachim,

  Your explanation is very clear. My grammar is mostly working now so I
don't think I'll be changing lexers. I'll probably first try adding the
extra token definitions and emitting two tokens for those cases.

Ad

On Mon, Oct 11, 2010 at 5:57 AM, Joachim Schrod  wrote:

> A Z wrote:
>
> > I have a lexer with the following rules:
> >
> > LBMINUSGT  : '[->';
> > LBASRB : '[*]';
> > LBAST  : '[*';
> > LBEQUALS   : '[=';
> > LBPLUSRB   : '[+]';
> > LBRACE : '{';
> > LBRACKET   : '[';
> > MINUS  : '-';
> >
> > The lexer fails(with an error message) when any string of '[-' or '[*' is
> > detected. I'm confused why ANTLR cannot tokenize '[-' correctly as
> LBRACKET
> > MINUS.
>
> Because ANTLR-lexers cannot backtrack.
>
> '[-' starts the token LBMINUSGT and only that token. Thus, when '['
> and '-' arrive in input, recognition for the token LBMINUSGT is
> started. When no '>' arrives, the lexer is not able to backtrack to
> the point in time where '-' has not arrived and turn '[' into
> LBRACKET. Since there are no other tokens that start with '[-', an
> error is reported and error recovery takes place.
>
> The canonical way to solve this problem is to create tokens that
> cover all prefixes of all existing tokens. I.e., in your cited
> grammar fragment you need additional tokens that match '[-' and '[+'.
>
> I hope this makes the problem more understandable,
>
>Joachim
>
> PS: Actually, there is a non-canonical way to solve the problem:
> One can use a different tool to generate the lexer, one that can
> backtrack, and use ANTLR only for its great parser abilities.
> That's what I do, I use JFlex, after having fought with ANTLR lexer
> definition restrictions one time too often. ;-)
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Joachim Schrod  Email: jsch...@acm.org
> Roedermark, Germany
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30300] [antlr-interest] Lexer errors when looking for wrong token

2010-10-10 Thread A Z
Hello,

I have a lexer with the following rules:


LBMINUSGT  : '[->';
LBASRB : '[*]';
LBAST  : '[*';
LBEQUALS   : '[=';
LBPLUSRB   : '[+]';
LBRACE : '{';
LBRACKET   : '[';
MINUS  : '-';

The lexer fails(with an error message) when any string of '[-' or '[*' is
detected. I'm confused why ANTLR cannot tokenize '[-' correctly as LBRACKET
MINUS. It also discards two characters after the failed token. I do not have
a static k defined and ANTLR generates no warnings when compiling. I'm still
debugging but it's slow figuring out how the antlr3dfapredict() function
works. Any help is appreciated.


Test input:
foo[-1]
foo[->saf]
foo[*saf]
foo[+saf]
foo[+]saf]
foo[0]

Test output :
frag.v(1) : lexer error 1 :
Unexpected character at offset 5, near '1' :
1]
Token: SIMPLE_IDENT foo
Token: SIMPLE_IDENT foo
Token:LBMINUSGT [->
Token: SIMPLE_IDENT saf
Token: RBRACKET ]
Token: SIMPLE_IDENT foo
Token:LBAST [*
Token: SIMPLE_IDENT saf
Token: RBRACKET ]
Token: SIMPLE_IDENT foo
Token: SIMPLE_IDENT f
Token: RBRACKET ]
Token: SIMPLE_IDENT foo
Token: LBPLUSRB [+]
Token: SIMPLE_IDENT saf
Token: RBRACKET ]
Token: SIMPLE_IDENT foo
Token: LBRACKET [
Token:   DECNUM 0
Token: RBRACKET ]

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30158] [antlr-interest] Limitation of method size in Java target

2010-09-15 Thread A Z
Hello,

  I have a somewhat large grammar that I just recently got compiling
with ANTLR with no problems most of the time. I still get timeout
errors occasionally. I'm now at the point where I'd like to run my
test suite against the grammar however the java compiler fails with
the following:

10. ERROR in /output/errorParser.java (at line 15)
public class errorParser extends DebugParser {
 ^^^
The code for the static initializer is exceeding the 65535 bytes limit


I found a thread from last year regarding this issue:

http://groups.google.com/group/il-antlr-interest/browse_thread/thread/f3a4ce6c3a5c803f

I tried a few combinations of the various settings however they either
caused the ANTLR compile to fail or did not resolve the java compile
issue. Is modifying the .java files the only reliable workaround at
this point? It looks like the 4000 "public static final BitSet
FOLLOW*' statements are the cause but I'm not sure how grammar coding
changes can reduce them.


Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 30095] [antlr-interest] Handling range-limited tokens

2010-09-07 Thread A Z
Hello all,

  The grammar I am trying to implement has many cases where the terminals
are special cases of identifiers. Below is an excerpt from the EBNF.

seq_input_list ::= level_input_list | edge_input_list
level_input_list ::= level_symbol { level_symbol }
edge_input_list ::= { level_symbol } edge_indicator { level_symbol }
edge_indicator ::= ( level_symbol level_symbol ) | edge_symbol
current_state ::= level_symbol
next_state ::= output_symbol | -
output_symbol ::= 0 | 1 | x | X
level_symbol ::= 0 | 1 | x | X | ? | b | B
edge_symbol ::= r | R | f | F | p | P | n | N | *

simple_identifier ::= [ a-zA-Z_ ] { [ a-zA-Z0-9_$ ] }

My ANTLR grammar is coded like this

edge_input_list :
level_symbol* edge_indicator level_symbol*;

edge_indicator :
LPARAN level_symbol level_symbol RPARAN
  | edge_symbol;

current_state :
level_symbol;

next_state :
output_symbol
  | MINUS;

output_symbol :
BINNUM; // 0 | 1 | x | X

level_symbol :
BINNUM
  | SIMPLE_IDENT; // 0 | 1 | x | X | ? | b | B

edge_symbol :
ASTERISK
  | SIMPLE_IDENT; // r | R | f | F | p | P | n | N | *

I now have a problem where ANTLR can't resolve level_symbol* in rule
edge_input_list because both level_symbol and edge_indicator(through
edge_symbol) resolve to a SIMPLE_IDENT token. However you'll notice the
actual characters allowed are unique for each terminal. What is the best way
to handle this?
  Originally I had separate tokens for each of the characters and made
simple_ident a parser rule as follows:

ANYCASER : 'r' | 'R';
ANYCASEB : 'b' | 'B';
SIMPLE_IDENT : (Alpha | '_') ('0'..'9' | 'a'..'z' | 'A'..'Z' | '_' | '$')*;

simple_identifier : SIMPLE_IDENT | ANYCASEB | ANYCASER | ...;

This works but quickly becomes unwieldy as there are other places in the
grammar that have similar situations using overlapping character sets.

Thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29980] Re: [antlr-interest] Test rig template (Was: Setting input in the interpreter)

2010-08-23 Thread A Z
ANTLRWorks creates __Test__.java in the outputs path. This creates an
instance of your parser with a debug port so you cannot run it standalone
lest it hangs waiting for the debugger. You can simply change the parser
instance to fix this.



On Mon, Aug 23, 2010 at 3:42 AM, Thomas Nilsson <
thomas.nils...@responsive.se> wrote:

> So, there were no-one that could point me in the right direction about the
> Test Rig? Is there a template? Does ANTLRWorks/IDE create one that I could
> start from? If so where would I find it?
>
> (Or maybe this is described in The Book? If so I'll be receiving it in a
> couple of days...)
>
>
> Thomas Nilsson, CTO, Agile Mentor
> Responsive Development Technologies AB
> Web: http://www.responsive.se
> Email: thomas.nils...@responsive.se
> Phone: +46 70 561 75 41
> Blog: http://www.responsive.se/thomas
>
> 20 aug 2010 kl. 21.55 skrev Thomas Nilsson:
>
> > 20 aug 2010 kl. 18.11 skrev Jim Idle:
> >
> >> Did you notice the tab that allows you to change the test rig template?
> >>
> >
> > Yes, I did, but it was not easy to find it again, now that I know that I
> probably need it ;-)
> >
> > I found one on the "Run"-menue and some setting in the Preferences (but
> no tab). But I don't really know what to make of it. How do I use it? I
> guess that the "Class" is to indicate a java class to load, and that the
> classpath should be tweaked accordingly in the
> Preferences->Compiler->Classpath. Would I lose any system classpaths that I
> need to explicitly add if I customize it?
> >
> > And finally, the class should be a complete implementation of the test
> rig. Is there a template somewhere?
> >
> > (All this is in ANTLRWorks 1.4, couldn't find anything similar in
> ANTLR-IDE, Eclipse 3.6...)
> >
> > Thomas Nilsson, CTO, Agile Mentor
> > Responsive Development Technologies AB
> > Web: http://www.responsive.se
> > Email: thomas.nils...@responsive.se
> > Phone: +46 70 561 75 41
> > Blog: http://www.responsive.se/thomas
> >
> >> Jim
> >>
> >>> -Original Message-
> >>> From: antlr-interest-boun...@antlr.org [mailto:antlr-interest-
> >>> boun...@antlr.org] On Behalf Of Thomas Nilsson
> >>> Sent: Thursday, August 19, 2010 11:17 PM
> >>> To: antlr-interest@antlr.org
> >>> Subject: [antlr-interest] Setting input in the interpreter
> >>>
> >>> Is there a way to set the input stream in the interpreter, either
> >> ANTLRWorks
> >>> or ANTLR IDE?
> >>>
> >>> My grammar is case insensitive,  and although there is an example on
> the
> >>> Wiki on how to do this for real, I was looking for a way to run my
> >> numerous
> >>> test cases in the interpreter before implementing anything else.
> >>>
> >>> Thomas Nilsson, CTO, Agile Mentor
> >>> Responsive Development Technologies AB
> >>> Web: http://www.responsive.se
> >>> Email: thomas.nils...@responsive.se
> >>> Phone: +46 70 561 75 41
> >>> Blog: http://www.responsive.se/thomas
> >>>
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> >>> email-address
> >>
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29968] Re: [antlr-interest] ANTLR enforces LL(1) beyond about 300 tokens

2010-08-22 Thread A Z
'randcase';
K_RANDSEQUENCE = 'randsequence';
K_RCMOS= 'rcmos';
K_REAL = 'real';
K_REALTIME = 'realtime';
K_REF  = 'ref';
K_REG  = 'reg';
K_REJECT_ON= 'reject_on';
K_RELEASE  = 'release';
K_REPEAT   = 'repeat';
K_RESTRICT = 'restrict';
K_RETURN   = 'return';
K_RNMOS= 'rnmos';
K_RPMOS= 'rpmos';
K_RTRAN= 'rtran';
K_RTRANIF0 = 'rtranif0';
K_RTRANIF1 = 'rtranif1';
K_SCALARED = 'scalared';
K_SEQUENCE = 'sequence';
K_SHORTINT = 'shortint';
K_SHORTREAL= 'shortreal';
K_SHOWCANCELLED= 'showcancelled';
K_SIGNED   = 'signed';
K_SMALL= 'small';
K_SOLVE= 'solve';
K_SPECIFY  = 'specify';
K_SPECPARAM= 'specparam';
//K_STATIC   = 'static';
K_STRING   = 'string';
K_STRONG   = 'strong';
K_STRONG0  = 'strong0';
K_STRONG1  = 'strong1';
//K_STRUCT   = 'struct';
K_SUPER= 'super';
K_SUPPLY0  = 'supply0';
K_SUPPLY1  = 'supply1';
K_TABLE= 'table';
K_TASK = 'task';
K_TIME = 'time';
K_TRAN = 'tran';
K_TRANIF0  = 'tranif0';
K_TRANIF1  = 'tranif1';
K_TRI  = 'tri';
K_TRI0 = 'tri0';
K_TRI1 = 'tri1';
K_TRIAND   = 'triand';
K_TRIOR= 'trior';
K_TRIREG   = 'trireg';
K_UNSIGNED = 'unsigned';
K_USE  = 'use';
K_UWIRE= 'uwire';
K_VECTORED = 'vectored';
K_WAIT = 'wait';
K_WAND = 'wand';
K_WEAK0= 'weak0';
K_WEAK1= 'weak1';
K_WHILE= 'while';
K_WIRE = 'wire';
K_WOR  = 'wor';
K_XNOR = 'xnor';
K_XOR  = 'xor';
KD_FATAL   = '$fatal';
KD_ERROR   = '$error';
KD_WARNING = '$warning';
KD_INFO= '$info';
KD_HOLD= '$hold';
KD_SETUP   = '$setup';
KD_SETUPHOLD   = '$setuphold';
KD_RECOVERY= '$recovery';
ATSIGN = '@';
ATTWO  = '@@';
PLUS   = '+';
MINUS  = '-';
ASTERISK   = '*';
AMPERSAND  = '&';
DOLLAR     = '$';
TILDE  = '~';
FSLASH = '/';
PERCENT= '%';
ASTWO  = '**';
CGT= '>';
CLT= '<';
BANG   = '!';
EQUALSTWO  = '==';
BANGEQUALS = '!=';
EQUALSTHREE= '===';
BANGEQUALSTWO  = '!==';
VBAR   = '|';
LTTWO  = '<<';
GTTWO  = '>>';
LTTHREE= '<<<';
GTTHREE= '>>>';
POUND  = '#';
LPARAN = '(';
RPARAN = ')';
SEMICOLON  = ';';
COLON  = ':';
COMMA  = ',';
LBRACKET   = '[';
RBRACKET   = ']';
LBRACE = '{';
RBRACE = '}';
PERIOD = '.';
EQUALS = '=';
QMARK  = '?';
EQUALSGT   = '=>';
FULLCON= '*>';
AMPSTAR= '&*';
PERIODAS   = '.*';
COLONCOLON = '::';
PLUSPLUS   = '++';
MINUSMINUS = '--';
GTEQUALS   = '>=';
LTEQUALS   = '<=';
COLONEQUALS= ':=';
COLONFSLASH= ':/';
POUNDTWO   = '##';
ASCOLCOLAS = '*::*';
POUNDMINUSPOUND= '#-#';
POUNDEQUALSPOUND   = '#=#';
LBASRB = '[*]';
LBPLUSRB   = '[+]';
AMPTWO = '&&';
AMPTHREE   = '&&&';
BARTWO = '||';
LPARANAS   = '(*';
ASRPARAN   = '*)';
APOSTROPHE = '\'';
PLUSEQUALS = '+=';
MINUSEQUALS= '-=';
PLUSCOLON  = '+:';
MINUSCOLON = '-:';
LPASRP = '(*)';
BARMINUSGT = '|->';
TILDEAMP   = '~&';
TILDEBAR   = '~|';
CARET  = '^';
TILDECARET = '~^';
CARETTILDE = '^~';
LTMINUSGT  = '<->';

EQUALSTWOQMARK = '==?';
BANGEQUALSQMARK= '!=?';
MINUSGT= '->';
}

fragment Alpha : ('a'..'z' | 'A'..'Z');
fragment IdentChar : ('0'..'9' | 'a'..'z' | 'A'..'Z' | '$' | '_');
SIMPLE_IDENT  : (Alpha | '_') IdentChar*;

unary_op  :
PLUS
  | MINUS
  | BANG
  | TILDE
  | AMPERSAND
  | TILDEAMP
  | VBAR
  | TILDEBAR
  | CARET
  | TILDECARET
  | CARETTILDE;

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.



[il-antlr-interest: 29966] [antlr-interest] ANTLR enforces LL(1) beyond about 300 tokens

2010-08-22 Thread A Z
Hello,

  I am trying to develop a SystemVerilog grammar using ANTLR 3.2. I was able
to successfully construct a Verilog2005 grammar and verified it against
about 800 tests. I used the same approach for SystemVerilog but upon
compilation I get lots of errors that make it clear ANTLR is only using
LL(1).

SystemVerilog has about twice the number of keywords and 50% more operators
than Verilog2005 so I took the working Verilog2005 grammar reduced it to
just the tokens and a single rule:


grammar Verilog2005;

tokens
{
K_ACCEPT_ON= 'accept_on';
K_ALIAS= 'alias';
K_ALWAYS   = 'always';
.
.
.
EQUALSTWOQMARK = '==?';
BANGEQUALSQMARK= '!=?';
MINUSGT    = '->';
}

fragment Alpha     : ('a'..'z' | 'A'..'Z');
fragment IdentChar : ('0'..'9' | 'a'..'z' | 'A'..'Z' | '$' | '_');
SIMPLE_IDENT  : (Alpha | '_') IdentChar*;

unary_op  :
PLUS
  | MINUS
  | BANG
  | TILDE
  | AMPERSAND
  | TILDEAMP
  | VBAR


I then slowly added the SystemVerilog tokens until it started failing.
Around 300 tokens I start getting these errors:

warning(209): temp.g:341:1: Multiple token rules can match input such as
"'a'": K_ACCEPT_ON, K_ALIAS, K_ALWAYS, K_ALWAYS_COMB, K_ALWAYS_FF,
K_ALWAYS_LATCH, K_AND, K_ASSERT, K_ASSIGN, K_ASSUME, K_AUTOMATIC,
SIMPLE_IDENT

As a result, token(s)
K_ALIAS,K_ALWAYS,K_ALWAYS_COMB,K_ALWAYS_FF,K_ALWAYS_LATCH,K_AND,K_ASSERT,K_ASSIGN,K_ASSUME,K_AUTOMATIC,SIMPLE_IDENT
were disabled for that input


I am not sure how to resolve this.  Removing the final identifier token also
allows a clean compile but the ANTLR book indicates ANTLR should try to
match in the order listed. Thanks.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.