Re: RFR: 8197594 - String and character repeat

Stuart Marks Mon, 26 Feb 2018 16:59:30 -0800

On 2/18/18 1:37 AM, James Laskey wrote:

Didn’t I hear someone mentioning “\U1D11A” at some point?


On 2/19/18 7:55 AM, Martin Buchholz wrote:

Oops, I already got it wrong - it's already at 6 hex digits because there are 17planes, not 16. MAX_CODE_POINT is U+10FFFF.
Yes, we need a variable width syntax like regex \x{h...h}

Yeah, there are a bunch of syntactic alternatives to consider. An "obvious"alternative to "\uxxxx" is "\Uxxxxxx" which works if you're always willing tospecify six digits (or to have some weird non-delimited but variable-lengthsequence, such as the existing octal escapes for chars (does anybody use those(see JLS 3.10.6)?)) The difference between \u and \U is rather subtle, though.Or a delimited sequence such as used by regex might be an alternative.

And java regex also supports
   \N{name}The character with Unicode character name 'name'
so we could do the same for the java language.
Although it would be a little weird to have every Unicode update make somepreviously invalid source files valid.
We could also say "It's 2018 and UTF-8 has won" and simply use UTF-8 in sourcefiles directly. No Unicode escapes needed.

Even if one is willing to have a source file in UTF-8 (as opposed to say, ASCII)there are things in Unicode that are really hard to edit. For example, there arezero-width spaces, joiners, non-joiners, directionality markers, etc. I thinkescapes are the bare minimum. Some kind of name-based interpolation would beuseful, but the actual Unicode names are rather unwieldy. Maybe something likeHTML entities would be worthwhile to investigate, though probably with adifferent syntax.


s'marks

Re: RFR: 8197594 - String and character repeat

Reply via email to