Just a note to those that have participated to the Unicode identifiers discussion. I have not forgotten about it. I just changed the proposed solution. Read below.
The old SableCC approach for identifiers and keywords was simply too
useful to throw away. Using Lexer, instead of $lexer is more visually
attractive, for one thing. But mostly, the camel case conversion of old
identifiers was just too useful. Being able to convert some_name to
SomeName without problems (e.g. ambiguous upper case or no concept of
lower/upper case in some scripts) is very convenient.
So, I decided to retain the old "pure ASCII" identifiers (with the old
rules: no upper case, etc.). But, I also allow for rich identifiers,
made up of Unicode characters. A rich identifier is enclosed within "<"
and ">", and it may not contain the underscore "_" character.
This way, I can get unambiguous conversions and I am also able to
concatenate identifiers to create new names. e.g.
prod_name = {alt_name:} ... | ...;
Generates: PProdName, AProdName_AltName (yes, different from SableCC3,
but it eliminates name conflicts).
<Gagnon> = {<Étienne>:} ... | ...;
Generates: P_Gagnon, A_Gagnon__Étienne. In other words, rich identifiers
are converted by adding a "_" prefix.
This way, we (hopefully) get to please everybody. We make things easy
for normal uses, and possible for complex uses.
Etienne
Etienne M. Gagnon wrote:
> 1- Both Helpers and Tokens sections are merged into a single Lexer
> section. Ignored is a subsection of Lexer.
> [...]
--
Etienne M. Gagnon, Ph.D.
SableCC: http://sablecc.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ SableCC-Discussion mailing list [email protected] http://lists.sablecc.org/listinfo/sablecc-discussion
