I think that you are of course somewhat correct in that there seem to be
certain ways you can express things that cause such explosions, and perhaps
these can be looked at case by case for the reasons why and fixed/improved.
But my experience tells me that the things that cause such explosions are
generally not the greatest ways to express things anyway. Sure, if they are
mathematically equivalent then they should resolve down to the same code. But
there are so many ways to construct things that I think at times practicality
gets in the way. Obviously equivalence has to be proven in some finite time,
but more than that, just keeping the analysis code manageable seems desirable
to some extent. However, as soon as users throw in actions then things like
equivalence got out the window anyway.
So, I don't generally end up with such huge explosions and if I do I usually
notice that my grammar looks a bit wonky anyway and there is a better/easier
way to express things.
Of course, apart from the things that might or might not be improved, there are
things that cannot be done because of how users organize things. For instance:
grammar t3;
r : SEQ* EOF ;
SEQ : (INT '.')? INT ;
fragment
INT : '0'..'9'+
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
In the SEQ rule, having the optional prefix causes quite a lot of DFA for what
it is. However it is clearly equivalent to:
SEQ : INT ('.' INT)? ;
... for all practical purposes, which generates much simpler code. In a parser
it is still trivial to write the AST according to what such elements 'mean' and
even if you are placing actions directly in the parser, you can still construct
it this second way (parser code is much better here - this is just a trivial
example to labor a point). However, as soon as a user says:
SEQ : (INT '.' {action because this is present} )? INT ;
Then no amount of analysis can get around the fact that the action has to be
executed. I only point out here that writing grammars without much thought to
what is going on under the hood is probably not such a good idea in the first
place.
In general I have seen that the user's organization of the grammar is the thing
that far outweighs any cleverness that is lacking in ANTLR. But, I did say in
general and "2 out of 3..." ;-).
At the end of the day, nothing is better than a well formed left factored
grammar. People seem to think that backtracking or lots of predicates adds some
readability to a grammar; personally I think it is the opposite and that what
happens is that people train themselves to read/see it that way. Train yourself
to see it differently and you will find backtracking grammars and lots of
predicates totally get in the way of your thinking.
Of course, horses for courses as they say, but as a professional programmer,
one should usually strive to produce neat, well crafted stuff and not throw
stuff together and hope the tool set works it out for you. I hasten to add here
that I am not throwing that accusation at the people involved here - I have no
idea how they put their grammars together.
I personally don't have any objection to putting DFAs in their own space, or
otherwise moving around the statics, but I do seem to think that we went down
this road and there seemed to be some particularly good reason not to do it.
But, either that reason is no longer valid, or I just plain forgot what it
was!! The Java target isn't my bag man, so I will leave that up to Ter (I will
make a not to ask him to look at it). C does not have this problem per se, but
of course if you end up with huge tables of static DFA, then there is a lot of
data space in the resulting object code.
Jim
> -----Original Message-----
> From: Loring Craymer [mailto:[email protected]]
> Sent: Wednesday, November 04, 2009 11:13 AM
> To: Jim Idle; Antlr interest
> Subject: Re: [antlr-interest] Big grammar => static initializer/method
> size is exceeding the 65535 bytes limit
>
> If DFA size grew linearly with the size of the grammar, Jim would be
> correct. The evidence I have seen is that they grow nonlinearly, so
> partitioning a grammar will not always work, and it is always best if a
> tool circumvents mysterious "explosions" during development. Alex's
> solution is both elegant and easy to apply, and the net cost is a few
> more .class files.
>
> --Loring
>
>
>
>
> ----- Original Message ----
> > From: Jim Idle <[email protected]>
> > To: Antlr interest <[email protected]>
> > Sent: Wed, November 4, 2009 8:10:14 AM
> > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method size is exceeding the 65535 bytes limit
> >
> > Guys - you are asking for the wrong problem to be fixed (at least of
> the three
> > of you, at least two will be ;-). Try the new -X options, then look
> at splitting
> > your grammar into multiple import grammars, then start taking out
> huge
> > predicates such as (expression)=> or generally (rule)=>. You can stop
> anywhere
> > along that path if you do not feel that optimizing the grammar is
> something
> > worth your while and the first and/or second options make the DFA
> table size
> > issue go away.
> >
> > There are cases where big DFAs become inevitable, and then you should
> definitely
> > look at the import capability, which will prevent a single parser
> class being
> > used for everything and allow you to manage what goes in which class
> at the
> > grammar level.
> >
> > Jim
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:antlr-interest-
> > > [email protected]] On Behalf Of Renee Luo
> > > Sent: Wednesday, November 04, 2009 6:45 AM
> > > To: Antlr interest
> > > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method
> > > size is exceeding the 65535 bytes limit
> > >
> > > Yes, we are trying to migrate our ANTLR2 grammar to ANTLR2, we are
> also
> > > facing this problem. If the static initialize code will be
> separated
> > > from parser_class, That's will be great for us.
> > >
> > > Renee
> > >
> > > -----Original Message-----
> > > From: [email protected] [mailto:antlr-interest-
> > > [email protected]] On Behalf Of Andreas Meyer
> > > Sent: Wednesday, November 04, 2009 8:32 AM
> > > To: Antlr interest
> > > Subject: Re: [antlr-interest] Big grammar => static
> initializer/method
> > > size is exceeding the 65535 bytes limit
> > >
> > > Back in the days when we tried to migrate our ANTLR2 grammar to
> ANTLR3,
> > > we also experienced this problem, due to lots of static initializer
> > > code
> > > in the _parser_ class. Our solution was to apply some perl-skript
> > > magic,
> > > but if Alex Marin now proposes a built-in solution, that is only
> good
> > > for ANTLR.
> > >
> > > Andreas
> > >
> > > Jim Idle schrieb:
> > > > I think that the issue is more likely something to do with your
> lexer
> > > specification. You should not need to worry about having lots of
> > > keywords, so one of the other rules must be causing the huge
> expansion.
> > > For instance I have problems with the complete lexer for TSQL,
> which
> > > has more keywords than you can shake a stick at.
> > > >
> > > > Did you ever post your complete lexer spec? I was out of the
> country
> > > when you first started this thread.
> > > >
> > > > Jim
> > > >
> > >
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> > >
> > > This email and its attachments may be confidential and are intended
> > > solely for the use of the individual to whom it is addressed. Any
> views
> > > or opinions expressed are solely those of the author and do not
> > > necessarily represent those of ImexSystems Inc.
> > > If you are not the intended recipient of this email and its
> > > attachments, you must take no action based upon them, nor must you
> copy
> > > or show them to anyone.
> > > Please contact the sender if you believe you have received this
> email
> > > in error.
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > > email-address
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
>
>
>
>
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---