Re: [Scheme-reports] DISCUSSION/VOTE: The character tower

John Cowan Mon, 05 May 2014 23:26:27 -0700

Bear scripsit:

> Yes, with the exception of code points which are not actually mapped to
> any character by the Unicode standard.


For clarification, which of these do you mean?

(a) Code points which will never correspond to any character, namely
the surrogates?  (These are already excluded by -small.)

(b) Code points for reserved noncharacters (there are 65 of these;
they are not to be used in interchange, but may be useful internally to
a program)?

(c) Codepoints that will (or at least may) be assigned to characters in
future versions of Unicode?

> > 7) Should R7RS-large implementations be required to
> > provide the characters from #\x10000 to #\x10FFFF?  
> 
> No.

I'm curious why you reject these, seemingly out of hand.  They are
required by a lot of scripts, though mostly archaic and minority-use ones.
You similarly reject #11 without explanation.

> > 8) Should R7RS-large implementations be required to allow #\x0 in strings?
> 
> Abstention.  If an implementation is serious enough about Unicode
> support to keep its strings in a Unicode normalized form, which ought
> not be forbidden, then NUL can never appear in any string. 

I don't understand this remark at all.  The normalized form of the U+0000
character under any normalization form is quite simply itself.  The
internal encoding of the characters with or without 0 bytes is not
relevant here.

> Yes, with the exception of code points which are not actually mapped to 
> any character by the unicode standard and code points which have a
> canonical decomposition (ie, the standard ought to allow an
> implementation to implement strings as unicode normalized strings). 

That is, in normalization form D, I assume you mean.  (Normalization form
C is more commonly used, and actually encourages the use of characters
with a canonical decomposition.)

> Identifiers which are distinct when in NFKC/NFKD normalized form
> must be considered distinct by all implementations.  Identifiers which
> are not distinct when normalized as NFD/NFK must _not_ be considered
> distinct by any implementation.  The standard should give a definite
> rule about identifiers which are distinct in NFD/NFK normalizations, but
> identical in NFKC/NFKD normalizations; are they to be considered
> distinct, considered identical, or is that implementation-defined?

This is an interesting point which I will probably ballot later.

-- 
John Cowan          http://www.ccil.org/~cowan        [email protected]
Female celebrity stalker, on a hot morning in Cairo:
"Imagine, Colonel Lawrence, ninety-two already!"
El Auruns's reply:  "Many happy returns of the day!"

_______________________________________________
Scheme-reports mailing list
[email protected]
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

Re: [Scheme-reports] DISCUSSION/VOTE: The character tower

Reply via email to