On 2/18/18 1:37 AM, James Laskey wrote:
Didn’t I hear someone mentioning “\U1D11A” at some point?

On 2/19/18 7:55 AM, Martin Buchholz wrote:
Oops, I already got it wrong - it's already at 6 hex digits because there are 17 planes, not 16.  MAX_CODE_POINT is U+10FFFF.
Yes, we need a variable width syntax like regex \x{h...h}

Yeah, there are a bunch of syntactic alternatives to consider. An "obvious" alternative to "\uxxxx" is "\Uxxxxxx" which works if you're always willing to specify six digits (or to have some weird non-delimited but variable-length sequence, such as the existing octal escapes for chars (does anybody use those (see JLS 3.10.6)?)) The difference between \u and \U is rather subtle, though. Or a delimited sequence such as used by regex might be an alternative.

And java regex also supports
   \N{name}The character with Unicode character name 'name'
so we could do the same for the java language.
Although it would be a little weird to have every Unicode update make some previously invalid source files valid.

We could also say "It's 2018 and UTF-8 has won" and simply use UTF-8 in source files directly.   No Unicode escapes needed.

Even if one is willing to have a source file in UTF-8 (as opposed to say, ASCII) there are things in Unicode that are really hard to edit. For example, there are zero-width spaces, joiners, non-joiners, directionality markers, etc. I think escapes are the bare minimum. Some kind of name-based interpolation would be useful, but the actual Unicode names are rather unwieldy. Maybe something like HTML entities would be worthwhile to investigate, though probably with a different syntax.

s'marks

Reply via email to