I think that you are of course somewhat correct in that there seem to be 
certain ways you can express things that cause such explosions, and perhaps 
these can be looked at case by case for the reasons why and fixed/improved.

But my experience tells me that the things that cause such explosions are 
generally not the greatest ways to express things anyway. Sure, if they are 
mathematically equivalent then they should resolve down to the same code. But 
there are so many ways to construct things that I think at times practicality 
gets in the way. Obviously equivalence has to be proven in some finite time, 
but more than that, just keeping the analysis code manageable seems desirable 
to some extent. However, as soon as users throw in actions then things like 
equivalence got out the window anyway.

So, I don't generally end up with such huge explosions and if I do I usually 
notice that my grammar looks a bit wonky anyway and there is a better/easier 
way to express things.

Of course, apart from the things that might or might not be improved, there are 
things that cannot be done because of how users organize things. For instance:

grammar t3;

r : SEQ* EOF ;

SEQ : (INT '.')? INT ;

fragment
INT :   '0'..'9'+
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

In the SEQ rule, having the optional prefix causes quite a lot of DFA for what 
it is. However it is clearly equivalent to:

SEQ : INT ('.' INT)? ;

... for all practical purposes, which generates much simpler code. In a parser 
it is still trivial to write the AST according to what such elements 'mean' and 
even if you are placing actions directly in the parser, you can still construct 
it this second way (parser code is much better here - this is just a trivial 
example to labor a point). However, as soon as a user says:

SEQ : (INT '.' {action because this is present} )? INT ;

Then no amount of analysis can get around the fact that the action has to be 
executed. I only point out here that writing grammars without much thought to 
what is going on under the hood is probably not such a good idea in the first 
place.

In general I have seen that the user's organization of the grammar is the thing 
that far outweighs any cleverness that is lacking in ANTLR. But, I did say in 
general and "2 out of 3..." ;-). 

At the end of the day, nothing is better than a well formed left factored 
grammar. People seem to think that backtracking or lots of predicates adds some 
readability to a grammar; personally I think it is the opposite and that what 
happens is that people train themselves to read/see it that way. Train yourself 
to see it differently and you will find backtracking grammars and lots of 
predicates totally get in the way of your thinking. 

Of course, horses for courses as they say, but as a professional programmer, 
one should usually strive to produce neat, well crafted stuff and not throw 
stuff together and hope the tool set works it out for you. I hasten to add here 
that I am not throwing that accusation at the people involved here - I have no 
idea how they put their grammars together.

I personally don't have any objection to putting DFAs in their own space, or 
otherwise moving around the statics, but I do seem to think that we went down 
this road and there seemed to be some particularly good reason not to do it. 
But, either that reason is no longer valid, or I just plain forgot what it 
was!! The Java target isn't my bag man, so I will leave that up to Ter (I will 
make a not to ask him to look at it). C does not have this problem per se, but 
of course if you end up with huge tables of static DFA, then there is a lot of 
data space in the resulting object code.

Jim

> -----Original Message-----
> From: Loring Craymer [mailto:[email protected]]
> Sent: Wednesday, November 04, 2009 11:13 AM
> To: Jim Idle; Antlr interest
> Subject: Re: [antlr-interest] Big grammar => static initializer/method
> size is exceeding the 65535 bytes limit
> 
> If DFA size grew linearly with the size of the grammar, Jim would be
> correct.  The evidence I have seen is that they grow nonlinearly, so
> partitioning a grammar will not always work, and it is always best if a
> tool circumvents mysterious "explosions" during development.  Alex's
> solution is both elegant and easy to  apply, and the net cost is a few
> more .class files.
> 
> --Loring
> 
> 
> 
> 
> ----- Original Message ----
> > From: Jim Idle <[email protected]>
> > To: Antlr interest <[email protected]>
> > Sent: Wed, November 4, 2009 8:10:14 AM
> > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method size is exceeding the 65535 bytes limit
> >
> > Guys - you are asking for the wrong problem to be fixed (at least of
> the three
> > of you, at least two will be ;-). Try the new -X options, then look
> at splitting
> > your grammar into multiple import grammars, then start taking out
> huge
> > predicates such as (expression)=> or generally (rule)=>. You can stop
> anywhere
> > along that path if you do not feel that optimizing the grammar is
> something
> > worth your while and the first and/or second options make the DFA
> table size
> > issue go away.
> >
> > There are cases where big DFAs become inevitable, and then you should
> definitely
> > look at the import capability, which will prevent a single parser
> class being
> > used for everything and allow you to manage what goes in which class
> at the
> > grammar level.
> >
> > Jim
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:antlr-interest-
> > > [email protected]] On Behalf Of Renee Luo
> > > Sent: Wednesday, November 04, 2009 6:45 AM
> > > To: Antlr interest
> > > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method
> > > size is exceeding the 65535 bytes limit
> > >
> > > Yes, we are trying to migrate our ANTLR2 grammar to ANTLR2, we are
> also
> > > facing this problem. If the static initialize code will be
> separated
> > > from parser_class, That's will be great for us.
> > >
> > > Renee
> > >
> > > -----Original Message-----
> > > From: [email protected] [mailto:antlr-interest-
> > > [email protected]] On Behalf Of Andreas Meyer
> > > Sent: Wednesday, November 04, 2009 8:32 AM
> > > To: Antlr interest
> > > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method
> > > size is exceeding the 65535 bytes limit
> > >
> > > Back in the days when we tried to migrate our ANTLR2 grammar to
> ANTLR3,
> > > we also experienced this problem, due to lots of static initializer
> > > code
> > > in the _parser_ class. Our solution was to apply some perl-skript
> > > magic,
> > > but if Alex Marin now proposes a built-in solution, that is only
> good
> > > for ANTLR.
> > >
> > > Andreas
> > >
> > > Jim Idle schrieb:
> > > > I think that the issue is more likely something to do with your
> lexer
> > > specification. You should not need to worry about having lots of
> > > keywords, so one of the other rules must be causing the huge
> expansion.
> > > For instance I have problems with the complete lexer for TSQL,
> which
> > > has more keywords than you can shake a stick at.
> > > >
> > > > Did you ever post your complete lexer spec? I was out of the
> country
> > > when you first started this thread.
> > > >
> > > > Jim
> > > >
> > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> > >
> > > This email and its attachments may be confidential and are intended
> > > solely for the use of the individual to whom it is addressed. Any
> views
> > > or opinions expressed are solely those of the author and do not
> > > necessarily represent those of ImexSystems Inc.
> > > If you are not the intended recipient of this email and its
> > > attachments, you must take no action based upon them, nor must you
> copy
> > > or show them to anyone.
> > > Please contact the sender if you believe you have received this
> email
> > > in error.
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> 
> 
> 
> 




List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to