[Rswg] Re: feedback on draft-rswg-rfc7997bis-05 (about NFC/Turkish)

Martin J . Dürst Sun, 02 Nov 2025 18:53:43 -0800

Hello John, others,

I'm going to answer about the question of ASCII vs. Latin, and whichLatin exactly, in a separate mail.

This mail is just about NFC/Turkish, in particular to correct somemisconceptions.


On 2025-11-02 12:07, John C Klensin wrote:

Jean, Martin,

One set of comments here illustrate why I have been urging caution in
making seemingly innocuous changes in response to seemingly innocuous
comments/ suggestions...

--On Thursday, October 30, 2025 13:10 -0500 Jean Mahoney
<[email protected]> wrote:

Current RPC operational procedure:  Postal addresses are not
required  in RFCs; however, if one is provided, the RPC will
update a country  name to match the English short name for the
country found here:  https:// www.iso.org/obp/ui/#search. This is
specified in the RFC  Style Guide:
https://www.rfc-editor.org/rfc/rfc7322#section-4.12

I believe there was already feedback to also include the ASCII
equivalent here.


To be exact, this should be a Latin script equivalent, not an
ASCII  equivalent.


[JM] Ack


Maybe not. A "Latin script equivalent" includes non-ASCII characters
used in common (and contemporary) Western European and Western
European languages and is a useful rule for, e.g., allowing Martin to
spell his name correctly.  But it is not limited to that.  Maybe a
rule about ASCII and what Unicode called the "Latin-1 Supplement"
(U+00CA through U+00FF or maybe even U+00A1 through U+00FF) would
work, although even that could lead to issues with dotless-i
(U+0131), which can cause NFC to fail unless the language is known,

No. Care is needed for Turkish (and Turkic languages) when using caseconversion (upper case to lower case and back). There is absolutely noproblem with NFC for Turkish.

and the Turkish / Romanian font style problem that the Unicode
Standard points out.

This was indeed a problem up to Unicode 2.1. The 'splitters' (as opposedto the 'lumpers') won (as they almost always do), and new characterswere encoded in Unicode 3.0 (Sept. 1999).


Now we have these characters for Turkish:

015E;LATIN CAPITAL LETTER S WITH CEDILLA;Lu;0;L;0053 0327;;;;N;LATINCAPITAL LETTER S CEDILLA;;;015F;015F;LATIN SMALL LETTER S WITH CEDILLA;Ll;0;L;0073 0327;;;;N;LATIN SMALLLETTER S CEDILLA;;015E;;015E0162;LATIN CAPITAL LETTER T WITH CEDILLA;Lu;0;L;0054 0327;;;;N;LATINCAPITAL LETTER T CEDILLA;;;0163;0163;LATIN SMALL LETTER T WITH CEDILLA;Ll;0;L;0074 0327;;;;N;LATIN SMALLLETTER T CEDILLA;;0162;;0162


And these characters for Romanian:

0218;LATIN CAPITAL LETTER S WITH COMMA BELOW;Lu;0;L;0053 0326;;;;N;;;;0219;

0219;LATIN SMALL LETTER S WITH COMMA BELOW;Ll;0;L;00730326;;;;N;;;0218;;0218

021A;LATIN CAPITAL LETTER T WITH COMMA BELOW;Lu;0;L;0054 0326;;;;N;;;;021B;

021B;LATIN SMALL LETTER T WITH COMMA BELOW;Ll;0;L;00740326;;;;N;;;021A;;021A

The Wikipedia page about the Romanian language written in Romanian (athttps://ro.wikipedia.org/wiki/Limba_română) uses the later, so it seemsthat in practice the problem you point out is essentially gone.


Regards,    Martin.

--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Rswg] Re: feedback on draft-rswg-rfc7997bis-05 (about NFC/Turkish)

Reply via email to