Jean, Martin, One set of comments here illustrate why I have been urging caution in making seemingly innocuous changes in response to seemingly innocuous comments/ suggestions...
--On Thursday, October 30, 2025 13:10 -0500 Jean Mahoney <[email protected]> wrote: >>> Current RPC operational procedure: Postal addresses are not >>> required in RFCs; however, if one is provided, the RPC will >>> update a country name to match the English short name for the >>> country found here: https:// www.iso.org/obp/ui/#search. This is >>> specified in the RFC Style Guide: >>> https://www.rfc-editor.org/rfc/rfc7322#section-4.12 >>> >>> I believe there was already feedback to also include the ASCII >>> equivalent here. >> >> To be exact, this should be a Latin script equivalent, not an >> ASCII equivalent. > > [JM] Ack Maybe not. A "Latin script equivalent" includes non-ASCII characters used in common (and contemporary) Western European and Western European languages and is a useful rule for, e.g., allowing Martin to spell his name correctly. But it is not limited to that. Maybe a rule about ASCII and what Unicode called the "Latin-1 Supplement" (U+00CA through U+00FF or maybe even U+00A1 through U+00FF) would work, although even that could lead to issues with dotless-i (U+0131), which can cause NFC to fail unless the language is known, and the Turkish / Romanian font style problem that the Unicode Standard points out. Closer to English, there is even that question about the language in which RFCs are normally written, a language that is usually considered American English rather than the British variety (consider U+00C6 and U+00E6). However, as soon as one moves past that contemporary Western European collection and into the rest of Latin script, things can get complicated, with language-specific rules. Now, why would any sane person want to write their name in a non-Latin script that is unfamiliar to almost all people in the IETF and everyone in the RPC and then create a Latin script "equivalent" (note 1 below) that contains extended Latin characters that might be almost as unfamiliar? Well, suppose they have studied common phonemes associated with Latin characters and Latin character sequences and concluded that some character in what Unicode calls the "Extended Latin" range matches the pronunciation of part of their name much better than the more common Latin script subset? From their standpoint, that makes perfectly good sense and, because it is their name, it is reasonable to be stubborn about it. From the standpoint of the reader of a future RFC, it would be only a slight exaggeration to suggest such a person should have stuck with the original script and not supplied a Latin script equivalent at all. FWIW, the same issues can reasonably apply to geographic or company names whose original/normal forms are in some rare indigenous languages and the associated writing systems. If we get back to principles, our reason (at least as I have understood it) for allowing authors to write their names in whatever form they prefer and/or normally use it is to avoid discriminatory behavior and/or to ensure accuracy. We've then asked for a ASCII equivalent to create very high odds that readers of RFCs who are not familiar with the script used could recognize the name (at least as different/ distinct from other names written in other unfamiliar scripts) and maybe even make a guess as to how the name would be pronounced. But, if the "equivalent" contains characters whose form is not different enough from other Latin script characters for someone who was not looking carefully to know they are different or whose phonetic pronunciation cannot easily be guessed, the reasoning for the equivalent fails. If we could count on IETF participants and (at least the vast majority of) RFC readers being familiar with, e.g., IPA (and IPA actually represented all of the needed phonemes), it would be sensible to require IPA transcriptions of names rather than, or as an alternative to, ASCII equivalents. But I'm guessing we cannot count on that. If we want to claim to be supportive of a global Internet (and participation in the IETF from all around the world), we need to be sensitive to these issues. And this, sadly, brings me back to a variation on one of my more recent themes. If we want to say something other than "ASCII equivalent" it would be reasonable to say "equivalent in ASCII characters or other Latin script characters acceptable to the RPC" and explicitly let the RPC make the decisions based on their good judgment. But "Latin script equivalent" doesn't work and, especially in combination with the statement about policy at the end of the first paragraph of 3.1, has at least the appearance of not allowing the RPC to reject the use of extended (or otherwise non-obvious) Latin characters in those interpretations. best, john Note 1: I think we will come to regret the use of "equivalent" (as in "ASCII equivalent" or "Latin script equivalent") in this document or elsewhere. We have traditionally allowed authors to pick among transliterations (accurate or not), mapping using standard conventions (such as "Duerst"), "English" names similar to the original ones in meaning and/or pronunciation, or just whatever they decide to call themselves in English-speaking environments. Whether any of those are "equivalent" is in the mind of the beholder; some might be and others not. The term used in the draft, "interpretation" is much better but still not perfect. -- rswg mailing list -- [email protected] To unsubscribe send an email to [email protected]
