> Ideas for changing the generated Java parsers. I can implement most of > these. Comments on the interface and semantics would be appreciated.
This is great material, thanks. > 1. The parser class can be declared public and/or abstract by using the > ``%define public'' and ``%define abstract'' directives. I can add the > other Java class modifiers with ``%define final'' and ``%define strictfp'' > and ``%define annotations "@..."'' directives (plural, since %define's > are not combined, but must be specified in the same %define) to be complete. > Or, we can have a single ``%define parser_class_modifiers "..."'' to > specify all modifiers together. Or both. > Implemented ``%define final/strictfp''. > 3. Add ``%define extends "Super"'' and ``%define implements "Interfaces"''. > Implemented. > 4. Add ``%define lex_throws "Exceptions"'' to parse() and not use it as > the default of ``%define throws "Exceptions"''. I already have your patch for these, right? For (1), Annotations would be nice, but not high priority of course. > 2. The parser class name currently defaults to ``YYParser'' (actually, > ``b4_prefixParser'', but I already submitted a patch for that bug). > Do people prefer to make it match the Java file name instead? Of course, > characters not allowed in Java names must be removed or replaced by ``_''. Yeah, that's nice since the filename is known at m4 time. > 5. Work around Java's ``code too large'' limitation for large parser tables. > http://lists.gnu.org/archive/html/help-bison/2008-10/msg00005.html > Bison could try to estimate how much bytecode is needed and choose > generate code accordingly, but it depends on the actual Java compiler > and the amount of user static initialization, so that's not a good idea. > The syntax in the text (not appendix) of the ``Java Language Specification, > Second Edition'' is just at the limit, depending on how it's converted from > ENBF and conflicts removed. The awk, cim, and pic grammars from > tests/existing.at are all under the limit. > Implemented ``%define parser_tables "small/medium/large"'' except docs > and tests. Only need changes to the Java skeleton. I think switching unconditionally to one-initializer-per-table is the easiest approach by far, and gives 99% of the benefit. > 6. Currently ``%union'' is silently ignored I thought it gave an error. :-) > , and Java types are used as > the TYPE in ``$<TYPE>'', ``%token<TYPE> ...'' and ``%type<TYPE> ...''. > I propose to interpret these ``<TYPE>'' as a field name in ``%union'', > interpreting it as a Java type if no such field name exists. > First, this matches the behavior of C/C++ parsers, even though Java doesn't > actually have union types. Also makes it easier to convert from C/C++. It is a bit a waste of memory... > Second, this allows the use of generic types since ``<TYPE>'' does not > allow ``>'' in TYPE. [...] > Bison doesn't plays nice with generics: '>' is not allowed > in ``<TYPE>'' by the syntax, and using a generic ``%define stype'' gives > a ``generic array creation'' error, and at least one place needs to > m4-quote commas in ``<TYPE>'' (fails with Map/*String,String*/). ... and if the Bison lexer was adjusted to allow balanced angle-brackets in TYPE instead, it would be possible to define the token stack as Object[], right? But I did not understand the comment about m4-quoting commas. > 7. Should we make sure ``$$'' have the right type? For example: Yes, would be nice. > 8. Remove ``public static final boolean bison = true;'' This corresponds > to ``#define YYBISON 1'' in C parsers, which can be used for conditional > compilation. There is no conditional compilation in Java, though you > probably can use reflection. Might as well use ``bisonVersion'' anyway. > Or keep it for compatibility, and document as ``public'' interface. Removing it is okay for me. It was meant for reflection, but bisonVersion seems good enough. > 9. Document ``bisonVersion'' and ``bisonSkeleton'' as part of the ``public'' > interface. Yes, thanks. > 10. If ``%verbose-error'' is not used, do not generate code for it. Not top priority, but I would not oppose this. I thought javac could in principle elide it, but maybe it does not because native methods can set final and private fields. > Document that ``errorVerbose'' can be changed given ``%verbose-error''. > Or make it ``yyErrorVerbose'' and provide getter and setter like ``%debug''. Latter option is definitely better. > 11. If ``%debug'' or -t/--debug is not used, do not generate code for it. > How to turn debugging on and off is already documented. Makes sense as well. > 12. Don't generate token names when not needed. If ``%token-table'' > or -k/--token-table is used, also generate the following function: > > > /** Returns the token number (for returning from yylex) for NAME. > NAME does not have to be quoted, but when unquoted, it is first > matched with a double-quoted literal string token, then a single-quoted > character token type, then a named token type. */ > public int getTokenNumber(String name); > > > /** Returns the token number (for returning from yylex) for NAME. > NAME may not have to be quoted, and only match a double-quoted > literal string token from the grammar. */ > public int getStringTokenNumber(String name); > > > By the way, the example in the ``Interface / Lexical / Calling Convention'' > node of the manual is wrong. It gives the internal token number, not the > ones returned by yylex. It needs to pass though ``yytoknum''. > C/C++ probably should define these functions too. Ok. > 13. Make the yyerror functions public. Otherwise, it's a member function > of the lexer, which is not available if defined with ``%code lexer {...}''. > Even without ``%code lexer {...}'', users shouldn't have to save a reference > to the lexer just to call its yyerror when it's already saved in the parser. Ok. > 14. Allow user-defined location class. Currently, you can only change its > name by using ``%define location_type''. We need this to print abbreviated > ranges, where common file names and line numbers are not printed. Or to > use encoded int or long for Position. > Perhaps ``%define location_type'' can be changed to mean that the default > class should not be generated. Or use ``%define no_default_location_type''. So, the location_type would be either "Location" (with the default code) or an external user-defined type (and likewise for position_type). Seems indeed better, but in that case I would change the default implementations' names to YYLocation and YYPosition. > 15. Defines a default position class with line and column, as in C, unless > ``%define position_type'' is used. This is not backward compatible though. > To be fully backward compatible, we need ``%define default_position_type'' > and ``%define no_default_location_type''. Ugly. Don't worry (too much) about backwards compatibility. > 16. Support ``%printer''. Even though the virtual toString() method is > used to print symbols, symbols of the same semantic type may have to be > displayed differently (for example, see Bison's grammar of itself) and > we shouldn't have to define new (sub)classes just to print differently. > Also, we may want something different from the ``natural'' toString(), > for example, quoted characters and strings (can't even override the > ``toString()'' of Character and String because they're final classes). %printer was a bit in flux, IIRC, while I was writing the skeleton. Overriding toString() is a good idea, the printer would be just an expression returning String, right (the default being "$$.toString()" hence)? > 17. Allow gcj version < 4.3 to be used in the testsuite. Currently > (in gnulib/m4/javacomp.m4), for gcj version < 4.3, only > ``-source 1.4 && -target 1.4'' and ``-source 1.3 && -target 1.4'' > are allowed. According th the comments there, these gcj really does > target 1.4. I guess we can relax that, or use -source 1.3 -target 1.4 > for bison's configure.ac ``gt_JAVACOMP([1.3], [1.4])''. > I check that ``javac -source 1.3 -target 1.4'' works with JDK 1.6. > By the way, gcj 3.4.4 is the ``system'' compiler on Cygwin. Good. > 19. On Cygwin, ``make check'' fails when using ``java'' because it generates > CR/LF output while autotest uses LF. Maybe do a ``sed -e 's/\r$//''' on the > output? Yes. Thanks, Paolo _______________________________________________ help-bison@gnu.org http://lists.gnu.org/mailman/listinfo/help-bison