Hello John, others,
[replying to your earlier mail, but having read the follow up answer to
Rob about 5h later]
The first part of this mail is about homographs. The term has been used
since 2002. If you know a better term, please feel free to propose it. I
just feel it's better to have a word when talking about something. I
don't see any danger of confusion on this mailing list.
I wasn't worried about homographs, but it took me some time to find the
main reason why. The main reason is that for names in RFCs, they don't
constitute an attack surface.
[I'm not a security expert, so maybe 'attack surface' is the wrong word;
if you have a better one, please tell me.]
The reason is simple: If we have both paypal.com and раураӏ.com (the
first all Latin, the second all Cyrillic), there is a clear danger of
spoofing. But if we have an author named 'P. Paypal' and another author
named P. Раураӏ (R. Raura') (*) (again the first all Latin, the second
all Cyrillic), that's not worse than having two authors named John Smith.
[(*) I used "Raupai" as a transcription in an earlier mail, which was
wrong because the second Cyrillic 'р' also corresponds to a Latin 'r'
and because the last letter (which I see as an upper case 'I', but is
actually lower case and could look more like a 'l' or '1') doesn't stand
for an 'i'.]
Having two authors named John Smith isn't an ideal situation, but I'm
sure the authors and the RPC would be able to handle this, if it ever
comes up.
What's more important to understand here is that there is no incentive
for anybody to falsely pretend to be a second John Smith just to get a
leg up on the first John Smith or somebody else. Why would somebody want
to write an RFC as John Smith when their real name is Bill Miller?
Somebody may want to create a website offering consulting, pretending to
be John Smith and having written that RFC, but that's already possible
today. According to my limited knowledge of DNS, e.g. klensin.com is
still up for grabs. smith.com isn't, but of course not because of an RFC
author named Smith.
So while homograph attacks are a thing to be careful about in the DNS,
and therefore in the URI handling code of browsers, they are not
something to worry about in RFC names.
[more below]
On 2025-11-02 05:18, John C Klensin wrote:
--On Friday, October 31, 2025 17:26 +0900 "Martin J. Dürst"
<[email protected]> wrote:
3. Whether the policy is aimed "for the reader" or "for the
author": Consensus seems to me that the doc should say something
about authors. Some explicit support from Brian's suggestion in
<https://
mailarchive.ietf.org/arch/msg/rswg/zF2-lBMYYDPj-igQivMo3O5ivWo>.
Might also want something saying, "The RPC style guide will
define which characters authors may use and how."
As long as the style guide is an RFC, or something with similar
change rate, I think this is way too inflexible. We already have
successful use of non-ASCII characters at least in the Latin script
that where used without any explicit guidance.
I think the plan is to make the Style Guide more of a web page or set
of them, rather than publishing it as an RFC. I hope the RPC (or
Style Guide approval mechanism if something else) will recognize that
too-frequent changes can cause general confusion and harm to authors
but, otherwise, I agree. More about this below.
I think we should try to write this draft/RFC under the assumption that
things may change, but they may as well stay as is, or may stay as is
longer than we hoped.
If the document were consistent about that, this would work for me.
And, again, I have no problem pushing the whole discussion to the
Style Manual as long as (as you indicated) it isn't too static _and_
whatever is said in this document not be misleading or confuse
things. For this case in particular, I'd rather see the whole
name/example distinction go to the Style Guide because there are some
special nuances there, ones that might evolve as understanding
increases.
I don't mind discussing how to handle nuances. But I'm definitely
against punting on some basic guidelines just because there might be
"special nuances".
8. JCK's 4(a) - 4(c) on NFC, directionality, naming: No discussion
so far, but again, with chair hat off, this sounds like style
guide material, not policy.
In particular with respect to (4c), I'd argue that there's *nobody*
with a name such as Cyrillic "раураӏ".
(Just in case there were, it would be required to also have a Latin
script equivalent (most probably something like "Raupai"), at which
point it would be clear that it's not Latin script, and any
interested user could cut-and-paste it into a tool that would
reveal the exact code points if needed.)
The Cyrillic paypal example was chosen, not because it was a
realistic name but because it is extremely familiar to many of those
who might be reading this discussion and/or the final document.
However, and probably sadly, you have just made my point (or three of
them):
(i) The document says "names", While it distinguishes
between names of authors and names of companies and
geographic entities, it does not draw further distinctions.
I.e., it does not clearly distinguish among, e.g., personal
(or family, etc.) names of authors or editors (of documents
and maybe in references), organization names in those
contexts, section titles and document titles in references,
or even names in examples or running text or quotations.
Increasingly broad readings of "names" along those lines
increase the odds of just such a string appearing. Such
lack of precision abut the category is a problem and, in
particular, "раураӏ" (with or without something like
"Raurai") might plausibly occur in some of them, even if not
the first. "paypal" (ASCII) is certainly a company name and
"раураӏ" (Cyrillic) might be too, but the document
makes the presence or absence of an ASCII interpretation a
matter of discretion of the author (I trust the RPC there,
but your separate comments suggests that you see the point).
In particular, see the overlap between this and your comment
about company names in your other note, so we might be close
to agreement on the subject after all.
[repeating myself] My understanding is that the document currently does
not, and should not, make the presence of a Latin equivalent a matter of
author discretion.
If you think 'names' isn't clear enough, I'm happy to discuss text that
makes this clearer.
(ii) A construction like "name (something)" does not imply
only "'name' not in Latin script" but could also be, e.g.,
"'string that might be a name' followed by a pronunciation
hint or explanation". Consider, e.g., "King Charles II (of
France)". Because many people get the pronunciation wrong, I
might even want to write "Klensin" in text followed by a
phonetic alphabet presentation. So that construction does
not automatically imply that the name is other than Latin
script.
This is just a minor side issue, but I'm looking forward to "King
Charles II (of France)", or for that matter "King Charles III (of the
United Kingdom)" to write his first RFC.
(iii) "раураӏ" (or, worse, "раүраӏ") is immediately
recognizable as Cyrillic (or not) depending on
the renderer's choice of display type styles or
fonts (something over which we have little control) and, even
then, far more easily by those are sensitive to such things
than those who are not (reader distinctions over which we
have even less control). Obvious to you and me might not be
obvious to a reader and, depending on the script, might not
even be obvious to the RPC.
Homograph attacks of course assume that there is no difference, or the
difference is too small to be recognized.
And, of course, none of that addresses the directionality issues.
I'm not aware of any serious directionality issues. If a name is RTL
(e.g. Arabic or Hebrew script), then if it's inline (e.g. in an
Acknowledgement section), the Unicode bidi algorithm should just take
care of it. If it's in the header, it also should work, even in the
ASCII version. If it's alone on a single line, such as in the Author's
Address section, an LRM (left-to-right mark) may be needed at the start
in the ASCII version to keep the name left-justified. In HTML, that can
be solved with CSS.
All of this could be taken as strong arguments for moving far more of
the discussion to the Style Guide but then to be sure this document
and the Style Guide do not diverge (or even appear to do so) and
probably to include explicit pointers to the Style Guide for details.
9. JCK's 5 on making a list of scripts and languages: No
discussion. Silence is not a good basis on which to judge
consensus.
Already said so above, but I think this makes things too
inflexible. If the RPC really feels it would help them, they can
always start such a list, but there's also a danger this would be
interpreted as exclusionary (somebody claiming somewhere "RFCs can
be written by Chinese and Japanese, but not Koreans" just because
the RPC didn't yet have a case of a Korean author and therefore
didn't yet put Korean/Hangul in the list).
I'm not sure I understand the lack of flexibility you are seeing. I
did not propose making that list part of an RFC, nor even of a more
easily updated Style Guide, but simply a list, updated whenever the
RPC considers that appropriate. Under any circumstances I can easily
imagine, it would be updated only by adding languages and/or scripts,
not removing them (unless, I suppose, a language or script about
which they thought they were confident turned out to be more
problematic than they had assumed, but, while I wouldn't want to
prohibit that, I'd expect it to be so rare as to be irrelevant). You
seem to have inferred a "you can't write text in something that is
not on the list" situation. I never intended that.
I'm not claiming that you intended it. What I wrote is that some third
party may interpret it that way. Or they may think that it would be a
major hassle to be the first to use a particular script or language.
Instead, think
about it as a convenience for authors and reviewers, especially
document shepherds. If a language/script combination, or, where
relevant, just a script, are on the list, people in the document
development process can have reasonable assurance that the text will
be handled smoothly and efficiently. If it isn't, then that should
serve as a recommendation to consult with the RPC earlier in the
process than handoff from the stream. If that recommendation were
ignored, the authors and stream should expect the possibility of
delays in processing as the RPC checks the text strings and finds
advice about them if needed.
It may not only be about scripts/languages, but also about specific
(groups of) characters. See also the discussion of Latin script in a
separate thread.
There are a lot of scripts/languages where most if not all characters
are highly unproblematic. There are some scripts where most characters
are highly unproblematic, but some may be tricky in some situations. It
can be expected that the authors should be familiar with the issues,
either because it's about their names or because it's in their examples.
The examples will be there for a purpose. An example to show some
specific bidi issue in a protocol may need different treatment from a
name, even if both are in the same language and script.
The RPC certainly is good about asking authors. Shepherds/chairs/ADs
will also either have the relevant knowledge, or will have asked
questions, or will have read the answers to questions from others.
Issues with non-ASCII examples,... should have surfaced long before
going to the RPC, and it's just a matter of telling the RPC about these
if there are any. There shouldn't be any significantly longer delays
than for other issues when publishing an RFC (of which we all know there
are many).
Regards, Martin.
Of course, if that function is not explained clearly, it might be
misinterpreted in the way you suggest, but I think that would be a
problem with the explanation, not the mechanism.
Maybe it is a useless idea if neither the community nor the RFC
Editor System care about long and seemingly unpredictable delays in
handling some documents, delays that unnecessarily make average
processing times longer. Personally, I think predictability is A
Good Thing and that the perception of relatively rapid processing for
most documents is too, but perhaps I'm in the minority.
Please add the issues from my mail today (Date: Fri, 31 Oct 2025
16:57:49 +0900) to Paul and the WG.
FWIW, I think we are in general agreement about all of those issues
except possibly about the boundary between this document and RPC
choices about the Style Guide. Your final comment (about names and
initials) may suggest something else entirely, which is that this WG
and/or the RSAB, ought to have a mechanism for making strong,
community-consensus, recommendations to the RPC (and the RPC should
have a mechanism for asking questions and getting such
recommendations) without needing to republish an updated version of
this document or other Editorial Stream RFCs.
thanks,
john
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]