Hello John, others,

I'm going to answer about the question of ASCII vs. Latin, and which Latin exactly, in a separate mail.

This mail is just about NFC/Turkish, in particular to correct some misconceptions.

On 2025-11-02 12:07, John C Klensin wrote:
Jean, Martin,

One set of comments here illustrate why I have been urging caution in
making seemingly innocuous changes in response to seemingly innocuous
comments/ suggestions...

--On Thursday, October 30, 2025 13:10 -0500 Jean Mahoney
<[email protected]> wrote:

Current RPC operational procedure:  Postal addresses are not
required  in RFCs; however, if one is provided, the RPC will
update a country  name to match the English short name for the
country found here:  https:// www.iso.org/obp/ui/#search. This is
specified in the RFC  Style Guide:
https://www.rfc-editor.org/rfc/rfc7322#section-4.12

I believe there was already feedback to also include the ASCII
equivalent here.

To be exact, this should be a Latin script equivalent, not an
ASCII  equivalent.

[JM] Ack

Maybe not. A "Latin script equivalent" includes non-ASCII characters
used in common (and contemporary) Western European and Western
European languages and is a useful rule for, e.g., allowing Martin to
spell his name correctly.  But it is not limited to that.  Maybe a
rule about ASCII and what Unicode called the "Latin-1 Supplement"
(U+00CA through U+00FF or maybe even U+00A1 through U+00FF) would
work, although even that could lead to issues with dotless-i
(U+0131), which can cause NFC to fail unless the language is known,

No. Care is needed for Turkish (and Turkic languages) when using case conversion (upper case to lower case and back). There is absolutely no problem with NFC for Turkish.


and the Turkish / Romanian font style problem that the Unicode
Standard points out.

This was indeed a problem up to Unicode 2.1. The 'splitters' (as opposed to the 'lumpers') won (as they almost always do), and new characters were encoded in Unicode 3.0 (Sept. 1999).

Now we have these characters for Turkish:

015E;LATIN CAPITAL LETTER S WITH CEDILLA;Lu;0;L;0053 0327;;;;N;LATIN CAPITAL LETTER S CEDILLA;;;015F; 015F;LATIN SMALL LETTER S WITH CEDILLA;Ll;0;L;0073 0327;;;;N;LATIN SMALL LETTER S CEDILLA;;015E;;015E 0162;LATIN CAPITAL LETTER T WITH CEDILLA;Lu;0;L;0054 0327;;;;N;LATIN CAPITAL LETTER T CEDILLA;;;0163; 0163;LATIN SMALL LETTER T WITH CEDILLA;Ll;0;L;0074 0327;;;;N;LATIN SMALL LETTER T CEDILLA;;0162;;0162

And these characters for Romanian:

0218;LATIN CAPITAL LETTER S WITH COMMA BELOW;Lu;0;L;0053 0326;;;;N;;;;0219;
0219;LATIN SMALL LETTER S WITH COMMA BELOW;Ll;0;L;0073 0326;;;;N;;;0218;;0218
021A;LATIN CAPITAL LETTER T WITH COMMA BELOW;Lu;0;L;0054 0326;;;;N;;;;021B;
021B;LATIN SMALL LETTER T WITH COMMA BELOW;Ll;0;L;0074 0326;;;;N;;;021A;;021A

The Wikipedia page about the Romanian language written in Romanian (at https://ro.wikipedia.org/wiki/Limba_română) uses the later, so it seems that in practice the problem you point out is essentially gone.

Regards,    Martin.

--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to