On 4/23/07, Alan Watson <[EMAIL PROTECTED]> wrote: > In formal comment 231, I stated: > > "Many current Schemes have lexers written for ASCII (or Latin-1) > character sets. Conversion of these lexers to the new standard would be > easier if the report allowed inline hex escapes to appear anywhere in > Scheme code." > > The editors replied: > > "It is unclear why converting the lexers would be significantly simpler > through this change" > > Let me explain my original opinion. Many Schemes currently have lexers > written in C using "char". These need converting to "long" to handle > Unicode. Furthermore, table-driven approaches are practical for ASCII > (128 values), but not practical for Unicode (roughly 2^24 values). > > In case that isn't clear enough: My Scheme uses flex for its lexer. I > cannot see how to simply convert it to accept Unicode. I think I will > have to dump flex and implement a new lexer by hand.
Normally you can make Flex work on Unicode by converting the input to UTF-8 before lexing it, having first rewritten the flex input to work on UTF-8. It's not exactly pretty, but (speaking from experience) if you don't mind accepting a superset of the valid characters for identifiers it's not bad at all. State-dependent recognizers in the flex input are very helpful here. --lars _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
