Hello everybody,
Sorry to write such long mails, and this should be the last one for today.
On 2025-11-03 00:50, Carsten Bormann wrote:
On Nov 2, 2025, at 10:17, Pete Resnick <[email protected]>
wrote:
I would find a Latin script equivalent of the Pinyin version including tone marks,
"Huáwéi Jìshù Yǒuxiàn Gōngsī" helpful, because I like getting the pronunciation
right, I imagine most people would not.
You might be surprised how many people are able to ignore those accents, or
actually derive value from them.
I think ASCII-only (and perhaps English translation) is really what we're
looking for.
Absolutely not.
Latin script is pretty much readable to anyone who can read English documents.
Those who have trouble ingesting Patrik Faltström or Jürgen Schönwälder can
speak up.
I don’t think either would want to give you ASCII transliterations, even if
they know people won’t pronounce the Latin right (but that’s true of ASCII
French names as well).
And most important, it's true of many English names as well. And in most
cases, "getting the pronunciation right" is definitely not about ASCII
transliterations or IPA. If a phoneme doesn't exist in the languages a
speaker is used to, it's difficult to get it right independent of how
hard one tries to indicate the correct pronunciation with whatever
sequence of characters.
The question has been raised why anybody would use non-Latin for their
name and then provide a Latin equivalent that is still not fully ASCII.
I agree that's a good question. However, I think that a non-ASCII
equivalent for a non-Latin name would be rather rare, because most
cultures where non-Latin is the first script learn English as the second
language and use mainly ASCII for names in Latin. Exceptions that I can
imagine are people from North-Western Africa (where Arabic is the first
and French the second Language) and similar situations. And in the
largest majority of this rather rare case, it's just about ignoring the
accents, and an ASCII-only variant would anyway be exactly the same as
removing the accents. (The case of German, where an "ü" is written "ue"
as a fallback is a rare exception.)
I also think it would be unfair to allow non-ASCII Latin in the
(primary) name but not in an equivalent for a non-Latin name, e.g.
Germans get their Umlauts but e.g. Chinese are not allowed to use tone
marks even if they want to (which as far as the RFCs I have seen isn't
the case up to now).
There has been some discussion (including some in private) about which
parts of the Latin script should be allowed. I'll summarize the
discussion here, and give my own opinion(s).
Besides Basic Latin (i.e. A-Z and a-z), there was a proposal to allow
only U+00C0 to U+00FF (the proposal said U+00CA through U+00FF, where
the U+00CA must be a typo). This is the Latin-1 Supplement block. It
would only cover Western European languages, which would be a relict of
the cold war.
There was also a proposal to say that "non-Latin script, or in a Latin
script including unusual characters, should be accompanied by an
equivalent in normal Latin script." The intent seems clear, but "unusual
characters" is highly subjective, and "normal", with a potential range
of not very flattering opposites, can sound highly judgemental.
There was also a (somewhat implicit) proposal to allow Basic Latin
letters with accents, overstrikes, and so on, but not different base
shapes. https://en.wikipedia.org/wiki/List_of_Latin-script_letters has a
very long list of actually used letters, with usage notes. The main
sections of interest to us are Other Letters, Letters with Diacritics,
and Digraphs and Ligatures. We could say that we allow all Letters with
Diacritics as long as they have a Basic Latin base. But that would
exclude German ß (called sharp s or sz). [The ij ligature (U+0133) was
also mentioned, but it's correctly classified as a stylistic ligature,
so it doesn't really get used in text, and it doesn't have to be
distinguished from simple "ij".]
In the Latin script, there is a very strong long-tail phenomenon, i.e. a
quickly decreasing frequency for more and more rare letters. Also, most
of the 'Other Letters', i.e. those that don't have a Basic Latin letter
as a base, are used for IPA or other phonetic systems or historic
languages. So my opinion is that even if we allow for all of Latin
script, the chance that we'll see a German ß or an Islandic Ð/ð/Þ/þ,
which are all in Latin-1 (U+00C0 to U+00FF) is way, way higher than the
chance of seeing any other "unknown shape classified nevertheless as
Latin". So we might as well leave it at "Latin script", and enjoy a
'weird' unknown character once in 10'000 RFCs or so.
Regards, Martin.
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]