Just a note to those that have participated to the Unicode identifiers
discussion. I have not forgotten about it. I just changed the proposed
solution. Read below.

The old SableCC approach for identifiers and keywords was simply too
useful to throw away. Using Lexer, instead of $lexer is more visually
attractive, for one thing. But mostly, the camel case conversion of old
identifiers was just too useful. Being able to convert some_name to
SomeName without problems (e.g. ambiguous upper case or no concept of
lower/upper case in some scripts) is very convenient.

So, I decided to retain the old "pure ASCII" identifiers (with the old
rules: no upper case, etc.). But, I also allow for rich identifiers,
made up of Unicode characters. A rich identifier is enclosed within "<"
and ">", and it may not contain the underscore "_" character.

This way, I can get unambiguous conversions and I am also able to
concatenate identifiers to create new names. e.g.

  prod_name = {alt_name:} ... | ...;

Generates: PProdName, AProdName_AltName  (yes, different from SableCC3,
but it eliminates name conflicts).

  <Gagnon> = {<Étienne>:} ... | ...;

Generates: P_Gagnon, A_Gagnon__Étienne. In other words, rich identifiers
are converted by adding a "_" prefix.

This way, we (hopefully) get to please everybody. We make things easy
for normal uses, and possible for complex uses.

Etienne

Etienne M. Gagnon wrote:
> 1- Both Helpers and Tokens sections are merged into a single Lexer
> section. Ignored is a subsection of Lexer.
> [...]

-- 
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
SableCC-Discussion mailing list
[email protected]
http://lists.sablecc.org/listinfo/sablecc-discussion

Reply via email to