Justin,
Use gperf to generate a perfect hash of your tokens. Use a simple rule to match anything, then a custom dictionary for the lexer that uses your pre-known token numbers. Look up in perfect hash and change the token to the returned value. I have done this a number of times when there is a large number of fixed keywords and it is easy to maintain. Perhaps you can do everything you need with just gperf to be honest. // Pick up the token definitions as assigned by ANTLR // #include "MySqlLexer.h" // Certain keywords, as well as being part of the keyword set are also // reserved words that cannot normally be used as identifiers the lexer // defines the value IS_RESERVED for us to use to indicate this. // %} %struct-type %ignore-case %language=ANSI-C %define hash-function-name getKeyword %define lookup-function-name getInWordSet %7bit %compare-lengths %readonly-tables %switch=1 %omit-struct-type mySqlKeywordTok; %% # -------------- # Reserved words # # Reserved words are used exclusively to specify syntactical # constructs in SQL and may not be used as identifiers. # ADD, KADD | IS_RESERVED ALL, ALL | IS_RESERVED ALTER, ALTER | IS_RESERVED ANALYZE, ANALYZE | IS_RESERVED AND, AND | IS_RESERVED > -----Original Message----- > From: [email protected] [mailto:antlr-interest- > [email protected]] On Behalf Of Justin Murray > Sent: Monday, March 14, 2011 9:00 AM > To: [email protected] > Subject: [antlr-interest] Best way to handle a large number of language > constants? > > Hi All, > > I am working on a proprietary language of ours that is reminiscent of > BASIC in some ways, but has morphed over the years into its own > monstrosity. This language is primarily used to command our hardware > devices. Our system has a large number of "parameters" (762 to be > precise) that define the complex configuration of the hardware. This > configuration mainly lives in a file that is read and sent down to the > hardware (where it is stored as a simple array of values), but there is > also the desire to edit these parameters programmatically at runtime. > Each parameter has a name, and a numeric value. The desire is for each > parameter to be read/written through simple assignment statements. For > example, "AxisType.X = 0" assigns the value 0 to the AxisType parameter > on the X axis. This is currently implemented in a seemingly terrible > way, and I am looking for the best way to improve it. > > The current implementation involves providing a #include file that > #defines each parameter as an array with a hard-coded index. This > include file is handled by the pre-processor so that the syntax in > question only has to handle the hard-coded array. The pre-processor is > not too terribly inefficient, but the problem is that we have to > distribute this enormous include file, and the users must remember to > include it. > > I can imagine a couple of other ways to implement this, but I am not > sure what way would be the most efficient. One way would be to add > every parameter name as a keyword in the lexer. This has the benefit of > relying on ANTLR to do all of the lexing for me, so that I don't have > to parse any strings later in my own code. The problem is that this > requires a lot of custom code in the grammar file (each token must have > a well defined numeric index associated with it, to match the index > used internally in the arrays). Additionally, I don't know how well > ANTLR will handle having so many hundreds of additional tokens in the > language. The good thing is that I could auto-generate the grammar from > our definition of the parameters (in XML format). > > Alternatively, I could add a very generic rule to the lexer that would > match any potentially valid parameter name, and wait until the semantic > actions to validate this as an actual parameter or a syntax error. > While this allows for a much simpler grammar on the ANTLR end, what I > don't like about this that I then have to write a bunch of C code that > essentially parses the string again. > > So I am looking for some advice on the best way to approach this > problem. If anyone has done something similar before, I would > appreciate any suggestions that you have for me. > > Much thanks, > > Justin Murray > [email protected] > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
