Dear Chairs and WG members,

These are my last call comments on rfc7997bis. I have read the document last evening. I also read John Klensin's comments from Oct. 25 from top to bottom, but I'm writing this mail separately with my own comments. I may contribute to the discussion of John's comments at a later stage if time permits.

I have to admit that I have not always had time to follow the WG discussion in detail, although I tried to skim most emails.

My overall impression is that the direction of the draft, in trying to be short, is okay. However, there are several issues of various severity that make the current draft unsuitable for forwarding to the IESG at this point in time. Some of these issues are fundamental and therefore severe, but they can all be fixed rather easily if there's agreement on what to do.

I'm not listing minor grammatical mistakes, of which I have found quite a few. These can be dealt with by the RPC.


Content, major: The draft needs to say that RFCs are written (mainly/mostly) in English. I know this was discussed, but I haven't seen the main argument, namely that we define policy and that this is policy. And if this isn't policy, then nothing in this draft is.

Editorial, major: The abstract should be written so that it can be read even in 10 or 20 years, which means it should not contain (and in particular shouldn't start) with historic references. As a start, the first paragraph of the abstract should move to the introduction, and the first two sentences of the introduction should in turn move to the abstract. After that, a bit of cleanup will be needed.

Content, major: Section 2 is entitled "Basic Requirements for Text in RFCs". But the way it's written, it contains requirements for "readers and browsers", people, maybe fonts, and searches. The text should be rewritten to actually talk about text in RFCs. As an example, instead of "RFCs should be displayed correctly across a wide range of readers and browsers.", write "RFCs should only contain text that can be displayed correctly across a wide range of readers and browsers.". Similar for the rest of the section.

Content, major: Section 3: "There are many Unicode characters that obviously cannot be displayed (such as control characters), and many whose ability to be displayed is debatable.": It's unclear what "many whose ability to be displayed is debatable." means. I'd guess it refers to scripts and characters standardized recently, for which font support is still thin. If that's what is meant, please say so; if something else is meant, please make clear what that is.

Content, major: Section 3 points to BCP137 for various notations. These are all numeric. There are many places where numeric notation is appropriate. But RFC7997 also recommends the use of Unicode character names. I see no reason to change this, as support for this is also available in RFC2XML. In some cases (see also below), character names make an RFC more readable because they reduce additional lookups. (I have nothing against mentioning that in some cases, Unicode character names contain errors, and in these cases, an official alias should be used.)

Content, major (same paragraph): "If an RFC includes such characters in normative or descriptive text, the RFC needs to also clearly describe the character.": There may be cases, in particular for the correct display of examples including bidirectional text in plain text, where we want to use bidi control characters but we do not want to "describe" them (because they are not needed in HTML or PostScript).

Content, major: 3.1 Names: This section confuses ASCII and Latin script. If you look at recent RFCs such as RFC 9694 (sorry, that was just the example that was easiest for me to find), the name is there in Latin script (M.J. Dürst at the top, Martin J. Dürst at the end), without an "ASCII interpretation". And there would be no point to force me to add an "ASCII interpretation" next time I write an RFC. So please change "These authors can give their names using only ASCII characters, or as Unicode characters and an ASCII interpretation of their name." to "Authors can give their names using only Latin script characters, or using non-Latin script and an equivalent in Latin script." Please note that this includes e.g. somebody (fictional) with a name of 加藤 竜太郎 with a Latin (not ASCII) equivalent of Ryūtarō Katō (if the person prefers this to the simpler Ryutaro Kato). Please also note that I'm using "equivalent", not "interpretation". There's no interpretation involved.

Editorial, medium: Please remove "Authors of RFCs whose names include non-ASCII characters will likely have preferences for how their names are displayed based on their lived experiences." People, including authors, just have names.

Content, major: "Company names and geographic names generally do not need ASCII interpretations, but they can be included at the discretion of the author and the RPC.": This would mean that I could give my affiliation as 青山学院大学 and my address as 相模原、日本 or so, but it surely can't be what we want.

Content, major: RFCs currently use last (family) name plus initial(s) in many places, and we should change this (as a matter of policy if necessary). The reason is that there are many people where the family name isn't very informative. This is very frequent for Koreans, Chinese, and Danish. It can also happen in other cultures.

Editorial, minor: 3.2 Examples: "giving the Unicode equivalent of the non-ASCII characters": This is confusing because these characters will be in UTF-8 and therefore will use Unicode. What we want to say is to use Unicode code points or Unicode character names.

Editorial, major: When talking about color, the text says "If so, those examples need to also include the "U+NNNN" syntax.". This excludes the possibility to use Unicode character names. But as has been discussed in previous mail, in the example at hand, it would be much more helpful for the reader to replace 'For example, "A color display should be able to differentiate 🔴 (U+1F534), 🟢 (U+1F7E2), and 🔵 (U+1F535)."' with 'For example, "A color display should be able to differentiate 🔴 (LARGE RED CIRCLE), 🟢 (LARGE GREEN CIRCLE), and 🔵 (LARGE BLUE CIRCLE).", because it saves somebody with a black-and-white display some lookups.

Content, major: 5. Security: "Valid Unicode that matches the expected text must be verified in order to preserve expected behavior and protocol information.": It's totally unclear what this means, and who should deal with it. Maybe this should read "Authors and the RPC should cross-check that the used characters match their code point numbers or Unicode character names." If something else is intended, please make clearer what it is.

Editorial, minor: The reference label "[UnicodeCurrent]" should be changed to "[UnicodeLatest]", because that will help people who are familiar with Unicode terminology. In the reference section, the year should be removed because that's how the Unicode Consortium advises to cite the latest version, see e.g. "Version References" at https://www.unicode.org/versions/Unicode17.0.0/. If the RFC Editor doesn't allow to remove the year, then at least 2025 should be used (currently 2023).

Content, minor: "in Normalization Form C (NFC) as defined in [UnicodeNorm]": I recently learned this by accident, but Unicode Standard Annex #15 does no longer actually define normalization. Paragraph 3 of the Introduction says "For the formal specification of the Unicode Normalization Algorithm, see Section 3.11, Normalization Forms in [Unicode].". So please change this at least to "in Normalization Form C (NFC) as defined in Section 3.11, Normalization Forms, in [UnicodeLatest] and [UnicodeNorm]".

Editorial, minor: For [UnicodeNorm] (if it's kept), change
'The Unicode Consortium, "Unicode Standard Annex", 2023' to
'The Unicode Consortium, "Unicode Standard Annex #15, Unicode Normalization Forms", 2025'.


Regards,    Martin.

--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to