[Rswg] Re: feedback on draft-rswg-rfc7997bis-05 (about Latin script)

Martin J . Dürst Mon, 03 Nov 2025 01:39:08 -0800

Hello everybody,

Sorry to write such long mails, and this should be the last one for today.


On 2025-11-03 00:50, Carsten Bormann wrote:

On Nov 2, 2025, at 10:17, Pete Resnick <[email protected]> 
wrote:


I would find a Latin script equivalent of the Pinyin version including tone marks, 
"Huáwéi Jìshù Yǒuxiàn Gōngsī" helpful, because I like getting the pronunciation 
right, I imagine most people would not.


You might be surprised how many people are able to ignore those accents, or 
actually derive value from them.

I think ASCII-only (and perhaps English translation) is really what we're 
looking for.


Absolutely not.

Latin script is pretty much readable to anyone who can read English documents.
Those who have trouble ingesting Patrik Faltström or Jürgen Schönwälder can 
speak up.
I don’t think either would want to give you ASCII transliterations, even if 
they know people won’t pronounce the Latin right (but that’s true of ASCII 
French names as well).

And most important, it's true of many English names as well. And in mostcases, "getting the pronunciation right" is definitely not about ASCIItransliterations or IPA. If a phoneme doesn't exist in the languages aspeaker is used to, it's difficult to get it right independent of howhard one tries to indicate the correct pronunciation with whateversequence of characters.

The question has been raised why anybody would use non-Latin for theirname and then provide a Latin equivalent that is still not fully ASCII.I agree that's a good question. However, I think that a non-ASCIIequivalent for a non-Latin name would be rather rare, because mostcultures where non-Latin is the first script learn English as the secondlanguage and use mainly ASCII for names in Latin. Exceptions that I canimagine are people from North-Western Africa (where Arabic is the firstand French the second Language) and similar situations. And in thelargest majority of this rather rare case, it's just about ignoring theaccents, and an ASCII-only variant would anyway be exactly the same asremoving the accents. (The case of German, where an "ü" is written "ue"as a fallback is a rare exception.)

I also think it would be unfair to allow non-ASCII Latin in the(primary) name but not in an equivalent for a non-Latin name, e.g.Germans get their Umlauts but e.g. Chinese are not allowed to use tonemarks even if they want to (which as far as the RFCs I have seen isn'tthe case up to now).

There has been some discussion (including some in private) about whichparts of the Latin script should be allowed. I'll summarize thediscussion here, and give my own opinion(s).

Besides Basic Latin (i.e. A-Z and a-z), there was a proposal to allowonly U+00C0 to U+00FF (the proposal said U+00CA through U+00FF, wherethe U+00CA must be a typo). This is the Latin-1 Supplement block. Itwould only cover Western European languages, which would be a relict ofthe cold war.

There was also a proposal to say that "non-Latin script, or in a Latinscript including unusual characters, should be accompanied by anequivalent in normal Latin script." The intent seems clear, but "unusualcharacters" is highly subjective, and "normal", with a potential rangeof not very flattering opposites, can sound highly judgemental.

There was also a (somewhat implicit) proposal to allow Basic Latinletters with accents, overstrikes, and so on, but not different baseshapes. https://en.wikipedia.org/wiki/List_of_Latin-script_letters has avery long list of actually used letters, with usage notes. The mainsections of interest to us are Other Letters, Letters with Diacritics,and Digraphs and Ligatures. We could say that we allow all Letters withDiacritics as long as they have a Basic Latin base. But that wouldexclude German ß (called sharp s or sz). [The ĳ ligature (U+0133) wasalso mentioned, but it's correctly classified as a stylistic ligature,so it doesn't really get used in text, and it doesn't have to bedistinguished from simple "ij".]

In the Latin script, there is a very strong long-tail phenomenon, i.e. aquickly decreasing frequency for more and more rare letters. Also, mostof the 'Other Letters', i.e. those that don't have a Basic Latin letteras a base, are used for IPA or other phonetic systems or historiclanguages. So my opinion is that even if we allow for all of Latinscript, the chance that we'll see a German ß or an Islandic Ð/ð/Þ/þ,which are all in Latin-1 (U+00C0 to U+00FF) is way, way higher than thechance of seeing any other "unknown shape classified nevertheless asLatin". So we might as well leave it at "Latin script", and enjoy a'weird' unknown character once in 10'000 RFCs or so.


Regards,   Martin.

--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Rswg] Re: feedback on draft-rswg-rfc7997bis-05 (about Latin script)

Reply via email to