[Rswg] Re: [Ext] Last call comments on draft-rswg-rfc7997bis

Paul Hoffman Tue, 28 Oct 2025 17:33:31 -0700

First off: thanks for the careful review with proposals for better wording! 
Notes below.


On Oct 28, 2025, at 01:35, Martin J. Dürst <[email protected]> wrote:

> I'm not listing minor grammatical mistakes, of which I have found quite a 
> few. These can be dealt with by the RPC.

Feel free to send those to me off-list so that I can make the RPC's job easier. 
(Most of the list doesn't know this, but Martin has helpfully made major and 
minor suggestions on quite a few of my drafts, all to their betterment, for 
more than 25 years.)

> Content, major: The draft needs to say that RFCs are written (mainly/mostly) 
> in English. I know this was discussed, but I haven't seen the main argument, 
> namely that we define policy and that this is policy. And if this isn't 
> policy, then nothing in this draft is.

The WG earlier decided that that policy already belongs to the RFC Editor, and 
is already reflected in their Style Guide. Some of the concern, which I agree 
with, is "what is English?" and whether trying to define that anywhere benefits 
anyone. If at some point, one of the streams wants to publish an RFC that is 
not in English, that stream will have to have a (likely contentious) talk with 
the RFC Editor. To reiterate: 9280bis, which this WG has already approved, 
leaves these types of decisions to the RPC, with the understanding that the RPC 
has been quite transparent with the community when issues have come up.

> Editorial, major: The abstract should be written so that it can be read even 
> in 10 or 20 years, which means it should not contain (and in particular 
> shouldn't start) with historic references. As a start, the first paragraph of 
> the abstract should move to the introduction, and the first two sentences of 
> the introduction should in turn move to the abstract. After that, a bit of 
> cleanup will be needed.

Thanks, I like this. When we start a draft, it's for the WG; by the time it's 
done, it should be for future readers.

> Content, major: Section 2 is entitled "Basic Requirements for Text in RFCs". 
> But the way it's written, it contains requirements for "readers and 
> browsers", people, maybe fonts, and searches. The text should be rewritten to 
> actually talk about text in RFCs. As an example, instead of "RFCs should be 
> displayed correctly across a wide range of readers and browsers.", write 
> "RFCs should only contain text that can be displayed correctly across a wide 
> range of readers and browsers.". Similar for the rest of the section.

Agree.

> Content, major: Section 3: "There are many Unicode characters that obviously 
> cannot be displayed (such as control characters), and many whose ability to 
> be displayed is debatable.": It's unclear what "many whose ability to be 
> displayed is debatable." means. I'd guess it refers to scripts and characters 
> standardized recently, for which font support is still thin. If that's what 
> is meant, please say so; if something else is meant, please make clear what 
> that is.

There is a wide variety of things that can be debatable. Are combining 
characters like U+0315 (COMBINING COMMA ABOVE RIGHT) displayable? What about 
non-spacing marks like U+0650 (ARABIC KASRA)? I am sure people would take each 
side of the debate ("I can see the symbol printed in the Unicode Standard" vs. 
"I can't see that code point on my laptop even though it has quite a complete 
font set" and so on).

> Content, major: Section 3 points to BCP137 for various notations. These are 
> all numeric. There are many places where numeric notation is appropriate. But 
> RFC7997 also recommends the use of Unicode character names. I see no reason 
> to change this, as support for this is also available in RFC2XML. In some 
> cases (see also below), character names make an RFC more readable because 
> they reduce additional lookups. (I have nothing against mentioning that in 
> some cases, Unicode character names contain errors, and in these cases, an 
> official alias should be used.)

Yep, there seems to be rough consensus for this on the list, and I'll make that 
change.

> Content, major (same paragraph): "If an RFC includes such characters in 
> normative or descriptive text, the RFC needs to also clearly describe the 
> character.": There may be cases, in particular for the correct display of 
> examples including bidirectional text in plain text, where we want to use 
> bidi control characters but we do not want to "describe" them (because they 
> are not needed in HTML or PostScript).

Why would we not want to describe them? We are quite sure that some people 
reading the RFC will have them displayed R-to-L, and others L-to-R.

> Content, major: 3.1 Names: This section confuses ASCII and Latin script. If 
> you look at recent RFCs such as RFC 9694 (sorry, that was just the example 
> that was easiest for me to find), the name is there in Latin script (M.J. 
> Dürst at the top, Martin J. Dürst at the end), without an "ASCII 
> interpretation". And there would be no point to force me to add an "ASCII 
> interpretation" next time I write an RFC. So please change "These authors can 
> give their names using only ASCII characters, or as Unicode characters and an 
> ASCII interpretation of their name." to
> "Authors can give their names using only Latin script characters, or using 
> non-Latin script and an equivalent in Latin script." Please note that this 
> includes e.g. somebody (fictional) with a name of 加藤 竜太郎 with a Latin (not 
> ASCII) equivalent of Ryūtarō Katō (if the person prefers this to the simpler 
> Ryutaro Kato). Please also note that I'm using "equivalent", not 
> "interpretation". There's no interpretation involved.

Yep, good change.

> Editorial, medium: Please remove "Authors of RFCs whose names include 
> non-ASCII characters will likely have preferences for how their names are 
> displayed based on their lived experiences." People, including authors, just 
> have names.

I fully disagree that authors don't have preferences. In fact, at various times 
in the past, you have had different preferences about the spelling of your 
surname in IETF documents. :-) In particular, some authors with Han / Kanji 
names have asked that their names be spelled with Latin characters, other have 
asked for their names to only be spelled with Han / Kanji, and yet others want 
both (often with the Latin of their family name in all caps). These are 
preferences that I think should be acknowledged and honored when sensible, even 
if bugs some other people.

> Content, major: "Company names and geographic names generally do not need 
> ASCII interpretations, but they can be included at the discretion of the 
> author and the RPC.": This would mean that I could give my affiliation as 
> 青山学院大学 and my address as 相模原、日本 or so, but it surely can't be what we want.

If that's what the author of an RFC and their stream manager wants, then it is 
indeed what we want. The RPC can disagree, but that disagreement is on a 
case-by-case basis, not colored by this document.

> Content, major: RFCs currently use last (family) name plus initial(s) in many 
> places, and we should change this (as a matter of policy if necessary). The 
> reason is that there are many people where the family name isn't very 
> informative. This is very frequent for Koreans, Chinese, and Danish. It can 
> also happen in other cultures.

I fully agree, but that's a topic for the Style Guide, not this document. If 
you start a thread about this on rfc-interest@, I would certainly participate.

> Editorial, minor: 3.2 Examples: "giving the Unicode equivalent of the 
> non-ASCII characters": This is confusing because these characters will be in 
> UTF-8 and therefore will use Unicode. What we want to say is to use Unicode 
> code points or Unicode character names.

Yep, good catch.

> Editorial, major: When talking about color, the text says "If so, those 
> examples need to also include the "U+NNNN" syntax.". This excludes the 
> possibility to use Unicode character names. But as has been discussed in 
> previous mail, in the example at hand, it would be much more helpful for the 
> reader to replace 'For example, "A color display should be able to 
> differentiate 🔴 (U+1F534), 🟢 (U+1F7E2), and 🔵 (U+1F535)."' with 'For example, 
> "A color display should be able to differentiate 🔴 (LARGE RED CIRCLE), 🟢 
> (LARGE GREEN CIRCLE), and 🔵 (LARGE BLUE CIRCLE).", because it saves somebody 
> with a black-and-white display some lookups.

Yep, there have been lots of agreement on the list about using names and U+NNNN 
here.

> Content, major: 5. Security: "Valid Unicode that matches the expected text 
> must be verified in order to preserve expected behavior and protocol 
> information.": It's totally unclear what this means, and who should deal with 
> it. Maybe this should read "Authors and the RPC should cross-check that the 
> used characters match their code point numbers or Unicode character names." 
> If something else is intended, please make clearer what it is.

I think that is what was intended, and your wording is clearer.

> Editorial, minor: The reference label "[UnicodeCurrent]" should be changed to 
> "[UnicodeLatest]", because that will help people who are familiar with 
> Unicode terminology.

Excellent!

> In the reference section, the year should be removed because that's how the 
> Unicode Consortium advises to cite the latest version, see e.g. "Version 
> References" at https://www.unicode.org/versions/Unicode17.0.0/. If the RFC 
> Editor doesn't allow to remove the year, then at least 2025 should be used 
> (currently 2023).

Agree, but it would be even better with no year. The RPC has a references 
specialist (hi, Ted!), and I'm sure that he would be interested in this. This 
is a topic for rfc-interest@; I'll start it there.

> Content, minor: "in Normalization Form C (NFC) as defined in [UnicodeNorm]": 
> I recently learned this by accident, but Unicode Standard Annex #15 does no 
> longer actually define normalization. Paragraph 3 of the Introduction says 
> "For the formal specification of the Unicode Normalization Algorithm, see 
> Section 3.11, Normalization Forms in [Unicode].". So please change this at 
> least to "in Normalization Form C (NFC) as defined in Section 3.11, 
> Normalization Forms, in [UnicodeLatest] and [UnicodeNorm]".

Accidents with Unicode are so fun...

> Editorial, minor: For [UnicodeNorm] (if it's kept), change
> 'The Unicode Consortium, "Unicode Standard Annex", 2023' to
> 'The Unicode Consortium, "Unicode Standard Annex #15, Unicode Normalization 
> Forms", 2025'.

Will do.

Thanks again!

--Paul Hoffman

smime.p7s
Description: S/MIME cryptographic signature

-- 
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Rswg] Re: [Ext] Last call comments on draft-rswg-rfc7997bis

Reply via email to