[Rswg] Re: Issue summary from WGLC (homograph attack surface)

Martin J . Dürst Sun, 02 Nov 2025 23:04:05 -0800

Hello John, others,

[replying to your earlier mail, but having read the follow up answer toRob about 5h later]

The first part of this mail is about homographs. The term has been usedsince 2002. If you know a better term, please feel free to propose it. Ijust feel it's better to have a word when talking about something. Idon't see any danger of confusion on this mailing list.

I wasn't worried about homographs, but it took me some time to find themain reason why. The main reason is that for names in RFCs, they don'tconstitute an attack surface.

[I'm not a security expert, so maybe 'attack surface' is the wrong word;if you have a better one, please tell me.]

The reason is simple: If we have both paypal.com and раураӏ.com (thefirst all Latin, the second all Cyrillic), there is a clear danger ofspoofing. But if we have an author named 'P. Paypal' and another authornamed P. Раураӏ (R. Raura') (*) (again the first all Latin, the secondall Cyrillic), that's not worse than having two authors named John Smith.

[(*) I used "Raupai" as a transcription in an earlier mail, which waswrong because the second Cyrillic 'р' also corresponds to a Latin 'r'and because the last letter (which I see as an upper case 'I', but isactually lower case and could look more like a 'l' or '1') doesn't standfor an 'i'.]

Having two authors named John Smith isn't an ideal situation, but I'msure the authors and the RPC would be able to handle this, if it evercomes up.

What's more important to understand here is that there is no incentivefor anybody to falsely pretend to be a second John Smith just to get aleg up on the first John Smith or somebody else. Why would somebody wantto write an RFC as John Smith when their real name is Bill Miller?

Somebody may want to create a website offering consulting, pretending tobe John Smith and having written that RFC, but that's already possibletoday. According to my limited knowledge of DNS, e.g. klensin.com isstill up for grabs. smith.com isn't, but of course not because of an RFCauthor named Smith.

So while homograph attacks are a thing to be careful about in the DNS,and therefore in the URI handling code of browsers, they are notsomething to worry about in RFC names.



[more below]

On 2025-11-02 05:18, John C Klensin wrote:

--On Friday, October 31, 2025 17:26 +0900 "Martin J. Dürst"
<[email protected]> wrote:

3. Whether the policy is aimed "for the reader" or "for the
author":  Consensus seems to me that the doc should say something
about authors.  Some explicit support from Brian's suggestion in
<https://
mailarchive.ietf.org/arch/msg/rswg/zF2-lBMYYDPj-igQivMo3O5ivWo>.
Might  also want something saying, "The RPC style guide will
define which  characters authors may use and how."


As long as the style guide is an RFC, or something with similar
change rate, I think this is way too inflexible. We already have
successful use of non-ASCII characters at least in the Latin script
that where used without any explicit guidance.


I think the plan is to make the Style Guide more of a web page or set
of them, rather than publishing it as an RFC.  I hope the RPC (or
Style Guide approval mechanism if something else) will recognize that
too-frequent changes can cause general confusion and harm to authors
but, otherwise, I agree.  More about this below.

I think we should try to write this draft/RFC under the assumption thatthings may change, but they may as well stay as is, or may stay as islonger than we hoped.

If the document were consistent about that, this would work for me.
And, again, I have no problem pushing the whole discussion to the
Style Manual as long as (as you indicated) it isn't too static _and_
whatever is said in this document not be misleading or confuse
things.   For this case in particular, I'd rather see the whole
name/example distinction go to the Style Guide because there are some
special nuances there, ones that might evolve as understanding
increases.

I don't mind discussing how to handle nuances. But I'm definitelyagainst punting on some basic guidelines just because there might be"special nuances".

8. JCK's 4(a) - 4(c) on NFC, directionality, naming: No discussion
so  far, but again, with chair hat off, this sounds like style
guide  material, not policy.


In particular with respect to (4c), I'd argue that there's *nobody*
with a name such as Cyrillic "раураӏ".

(Just in case there were, it would be required to also have a Latin
script equivalent (most probably something like "Raupai"), at which
point it would be clear that it's not Latin script, and any
interested user could cut-and-paste it into a tool that would
reveal the exact code points if needed.)


The Cyrillic paypal example was chosen, not because it was a
realistic name but because it is extremely familiar to many of those
who might be reading this discussion and/or the final document.
However, and probably sadly, you have just made my point (or three of
them):

(i)  The document says "names",  While it distinguishes
        between names of authors and names of companies and
        geographic entities, it does not draw further distinctions.
        I.e., it does not clearly distinguish among, e.g., personal
        (or family, etc.) names of authors or editors (of documents
        and maybe in references), organization names in those
        contexts, section titles and document titles in references,
        or even names in examples or running text or quotations.
        Increasingly broad readings of "names" along those lines
        increase the odds of just such a string appearing.   Such
        lack of precision abut the category is a problem and, in
        particular, "раураӏ" (with or without something like
        "Raurai") might plausibly occur in some of them, even if not
        the first.  "paypal" (ASCII) is certainly a company name and
        "раураӏ" (Cyrillic) might be too, but the document
        makes the presence or absence of an ASCII interpretation a
        matter of discretion of the author (I trust the RPC there,
        but your separate comments suggests that you see the point).
        In particular, see the overlap between this and your comment
        about company names in your other note, so we might be close
        to agreement on the subject after all.

[repeating myself] My understanding is that the document currently doesnot, and should not, make the presence of a Latin equivalent a matter ofauthor discretion.

If you think 'names' isn't clear enough, I'm happy to discuss text thatmakes this clearer.

(ii) A construction like "name (something)" does not imply
        only "'name' not in Latin script" but could also be, e.g.,
        "'string that might be a name' followed by a pronunciation
        hint or explanation".  Consider, e.g., "King Charles II (of
        France)". Because many people get the pronunciation wrong, I
        might even want to write "Klensin" in text followed by a
        phonetic alphabet presentation.  So that construction does
        not automatically imply that the name is other than Latin
        script.

This is just a minor side issue, but I'm looking forward to "KingCharles II (of France)", or for that matter "King Charles III (of theUnited Kingdom)" to write his first RFC.

(iii) "раураӏ" (or, worse, "раүраӏ") is immediately
   recognizable as Cyrillic (or not) depending on
        the renderer's choice of display type styles or
        fonts (something over which we have little control) and, even
        then, far more easily by those are sensitive to such things
        than those who are not (reader distinctions over which we
        have even less control).  Obvious to you and me might not be
        obvious to a reader and, depending on the script, might not
        even be obvious to the RPC.

Homograph attacks of course assume that there is no difference, or thedifference is too small to be recognized.

And, of course, none of that addresses the directionality issues.

I'm not aware of any serious directionality issues. If a name is RTL(e.g. Arabic or Hebrew script), then if it's inline (e.g. in anAcknowledgement section), the Unicode bidi algorithm should just takecare of it. If it's in the header, it also should work, even in theASCII version. If it's alone on a single line, such as in the Author'sAddress section, an LRM (left-to-right mark) may be needed at the startin the ASCII version to keep the name left-justified. In HTML, that canbe solved with CSS.

All of this could be taken as strong arguments for moving far more of
the discussion to the Style Guide but then to be sure this document
and the Style Guide do not diverge (or even appear to do so) and
probably to include explicit pointers to the Style Guide for details.

9. JCK's 5 on making a list of scripts and languages: No
discussion.  Silence is not a good basis on which to judge
consensus.


Already said so above, but I think this makes things too
inflexible. If the RPC really feels it would help them, they can
always start such a list, but there's also a danger this would be
interpreted as exclusionary (somebody claiming somewhere "RFCs can
be written by Chinese and Japanese, but not Koreans" just because
the RPC didn't yet have a case of a Korean author and therefore
didn't yet put Korean/Hangul in the list).


I'm not sure I understand the lack of flexibility you are seeing.  I
did not propose making that list part of an RFC, nor even of a more
easily updated Style Guide, but simply a list, updated whenever the
RPC considers that appropriate.  Under any circumstances I can easily
imagine, it would be updated only by adding languages and/or scripts,
not removing them (unless, I suppose, a language or script about
which they thought they were confident turned out to be more
problematic than they had assumed, but, while I wouldn't want to
prohibit that, I'd expect it to be so rare as to be irrelevant).  You
seem to have inferred a  "you can't write text in something that is
not on the list" situation.  I never intended that.

I'm not claiming that you intended it. What I wrote is that some thirdparty may interpret it that way. Or they may think that it would be amajor hassle to be the first to use a particular script or language.

Instead, think
about it as a convenience for authors and reviewers, especially
document shepherds.  If a language/script combination, or, where
relevant, just a script, are on the list, people in the document
development process can have reasonable assurance that the text will
be handled smoothly and efficiently.   If it isn't, then that should
serve as a recommendation to consult with the RPC earlier in the
process than handoff from the stream.  If that recommendation were
ignored, the authors and stream should expect the possibility of
delays in processing as the RPC checks the text strings and finds
advice about them if needed.

It may not only be about scripts/languages, but also about specific(groups of) characters. See also the discussion of Latin script in aseparate thread.

There are a lot of scripts/languages where most if not all charactersare highly unproblematic. There are some scripts where most charactersare highly unproblematic, but some may be tricky in some situations. Itcan be expected that the authors should be familiar with the issues,either because it's about their names or because it's in their examples.The examples will be there for a purpose. An example to show somespecific bidi issue in a protocol may need different treatment from aname, even if both are in the same language and script.

The RPC certainly is good about asking authors. Shepherds/chairs/ADswill also either have the relevant knowledge, or will have askedquestions, or will have read the answers to questions from others.Issues with non-ASCII examples,... should have surfaced long beforegoing to the RPC, and it's just a matter of telling the RPC about theseif there are any. There shouldn't be any significantly longer delaysthan for other issues when publishing an RFC (of which we all know thereare many).


Regards,    Martin.

Of course, if that function is not explained clearly, it might be
misinterpreted in the way you suggest, but I think that would be a
problem with the explanation, not the mechanism.

Maybe it is a useless idea if neither the community nor the RFC
Editor System care about long and seemingly unpredictable delays in
handling some documents, delays that unnecessarily make average
processing times longer.  Personally, I think predictability is A
Good Thing and that the perception of relatively rapid processing for
most documents is too, but perhaps I'm in the minority.

Please add the issues from my mail today (Date: Fri, 31 Oct 2025
16:57:49 +0900) to Paul and the WG.


FWIW, I think we are in general agreement about all of those issues
except possibly about the boundary between this document and RPC
choices about the Style Guide.  Your final comment (about names and
initials) may suggest something else entirely, which is that this WG
and/or the RSAB, ought to have a mechanism for making strong,
community-consensus, recommendations to the RPC (and the RPC should
have a mechanism for asking questions and getting such
recommendations) without needing to republish an updated version of
this document or other Editorial Stream RFCs.

thanks,
    john


--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Rswg] Re: Issue summary from WGLC (homograph attack surface)

Reply via email to