Jean, Martin,

One set of comments here illustrate why I have been urging caution in
making seemingly innocuous changes in response to seemingly innocuous
comments/ suggestions...

--On Thursday, October 30, 2025 13:10 -0500 Jean Mahoney
<[email protected]> wrote:

>>> Current RPC operational procedure:  Postal addresses are not
>>> required  in RFCs; however, if one is provided, the RPC will
>>> update a country  name to match the English short name for the
>>> country found here:  https:// www.iso.org/obp/ui/#search. This is
>>> specified in the RFC  Style Guide:
>>> https://www.rfc-editor.org/rfc/rfc7322#section-4.12
>>> 
>>> I believe there was already feedback to also include the ASCII 
>>> equivalent here.
>> 
>> To be exact, this should be a Latin script equivalent, not an
>> ASCII  equivalent.
> 
> [JM] Ack

Maybe not. A "Latin script equivalent" includes non-ASCII characters
used in common (and contemporary) Western European and Western
European languages and is a useful rule for, e.g., allowing Martin to
spell his name correctly.  But it is not limited to that.  Maybe a
rule about ASCII and what Unicode called the "Latin-1 Supplement"
(U+00CA through U+00FF or maybe even U+00A1 through U+00FF) would
work, although even that could lead to issues with dotless-i
(U+0131), which can cause NFC to fail unless the language is known,
and the Turkish / Romanian font style problem that the Unicode
Standard points out.  Closer to English, there is even that question
about the language in which RFCs are normally written, a language
that is usually considered American English rather than the British
variety (consider U+00C6 and U+00E6).   However, as soon as one moves
past that contemporary Western European collection and into the rest
of Latin script, things can get complicated, with language-specific
rules.

Now, why would any sane person want to write their name in a
non-Latin script that is unfamiliar to almost all people in the IETF
and everyone in the RPC and then create a Latin script "equivalent"
(note 1 below) that contains extended Latin characters that might be
almost as unfamiliar?   Well, suppose they have studied common
phonemes associated with Latin characters and Latin character
sequences and concluded that some character in what Unicode calls the
"Extended Latin" range matches the pronunciation of part of their
name much better than the more common Latin script subset?  From
their standpoint, that makes perfectly good sense and, because it is
their name, it is reasonable to be stubborn about it.  From the
standpoint of the reader of a future RFC, it would be only a slight
exaggeration to suggest such a person should have stuck with the
original script and not supplied a Latin script equivalent at all.
FWIW, the same issues can reasonably apply to geographic or company
names whose original/normal forms are in some rare indigenous
languages and the associated writing systems.

If we get back to principles, our reason (at least as I have
understood it) for allowing authors to write their names in whatever
form they prefer and/or normally use it is to avoid discriminatory
behavior and/or to ensure accuracy. We've then asked for a ASCII
equivalent to create very high odds that readers of RFCs who are not
familiar with the script used could recognize the name (at least as
different/ distinct from other names written in other unfamiliar
scripts) and maybe even make a guess as to how the name would be
pronounced.  But, if the "equivalent" contains characters whose form
is not different enough from other Latin script characters for
someone who was not looking carefully to know they are different or
whose phonetic pronunciation cannot easily be guessed, the reasoning
for the equivalent fails. 

If we could count on IETF participants and (at least the vast
majority of) RFC readers being familiar with, e.g., IPA (and IPA
actually represented all of the needed phonemes), it would be
sensible to require IPA transcriptions of names rather than, or as an
alternative to, ASCII equivalents.   But I'm guessing we cannot count
on that.

If we want to claim to be supportive of a global Internet (and
participation in the IETF from all around the world), we need to be
sensitive to these issues.

And this, sadly, brings me back to a variation on one of my more
recent themes.  If we want to say something other than "ASCII
equivalent" it would be reasonable to say "equivalent in ASCII
characters or other Latin script characters acceptable to the RPC"
and explicitly let the RPC make the decisions based on their good
judgment.  But "Latin script equivalent" doesn't work and, especially
in combination with the statement about policy at the end of the
first paragraph of 3.1, has at least the appearance of not allowing
the RPC to reject the use of extended (or otherwise non-obvious)
Latin characters in those interpretations.

best,
   john

Note 1: I think we will come to regret the use of "equivalent" (as in
"ASCII equivalent" or "Latin script equivalent") in this document or
elsewhere.   We have traditionally allowed authors to pick among
transliterations (accurate or not), mapping using standard
conventions (such as "Duerst"), "English" names similar to the
original ones in meaning and/or pronunciation, or just whatever they
decide to call themselves in English-speaking environments.  Whether
any of those are "equivalent" is in the mind of the beholder; some
might be and others not.  The term used in the draft,
"interpretation" is much better but still not perfect.
 

-- 
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to