Dear Chairs and WG members,
These are my last call comments on rfc7997bis. I have read the document
last evening. I also read John Klensin's comments from Oct. 25 from top
to bottom, but I'm writing this mail separately with my own comments. I
may contribute to the discussion of John's comments at a later stage if
time permits.
I have to admit that I have not always had time to follow the WG
discussion in detail, although I tried to skim most emails.
My overall impression is that the direction of the draft, in trying to
be short, is okay. However, there are several issues of various severity
that make the current draft unsuitable for forwarding to the IESG at
this point in time. Some of these issues are fundamental and therefore
severe, but they can all be fixed rather easily if there's agreement on
what to do.
I'm not listing minor grammatical mistakes, of which I have found quite
a few. These can be dealt with by the RPC.
Content, major: The draft needs to say that RFCs are written
(mainly/mostly) in English. I know this was discussed, but I haven't
seen the main argument, namely that we define policy and that this is
policy. And if this isn't policy, then nothing in this draft is.
Editorial, major: The abstract should be written so that it can be read
even in 10 or 20 years, which means it should not contain (and in
particular shouldn't start) with historic references. As a start, the
first paragraph of the abstract should move to the introduction, and the
first two sentences of the introduction should in turn move to the
abstract. After that, a bit of cleanup will be needed.
Content, major: Section 2 is entitled "Basic Requirements for Text in
RFCs". But the way it's written, it contains requirements for "readers
and browsers", people, maybe fonts, and searches. The text should be
rewritten to actually talk about text in RFCs. As an example, instead of
"RFCs should be displayed correctly across a wide range of readers and
browsers.", write "RFCs should only contain text that can be displayed
correctly across a wide range of readers and browsers.". Similar for the
rest of the section.
Content, major: Section 3: "There are many Unicode characters that
obviously cannot be displayed (such as control characters), and many
whose ability to be displayed is debatable.": It's unclear what "many
whose ability to be displayed is debatable." means. I'd guess it refers
to scripts and characters standardized recently, for which font support
is still thin. If that's what is meant, please say so; if something else
is meant, please make clear what that is.
Content, major: Section 3 points to BCP137 for various notations. These
are all numeric. There are many places where numeric notation is
appropriate. But RFC7997 also recommends the use of Unicode character
names. I see no reason to change this, as support for this is also
available in RFC2XML. In some cases (see also below), character names
make an RFC more readable because they reduce additional lookups. (I
have nothing against mentioning that in some cases, Unicode character
names contain errors, and in these cases, an official alias should be used.)
Content, major (same paragraph): "If an RFC includes such characters in
normative or descriptive text, the RFC needs to also clearly describe
the character.": There may be cases, in particular for the correct
display of examples including bidirectional text in plain text, where we
want to use bidi control characters but we do not want to "describe"
them (because they are not needed in HTML or PostScript).
Content, major: 3.1 Names: This section confuses ASCII and Latin script.
If you look at recent RFCs such as RFC 9694 (sorry, that was just the
example that was easiest for me to find), the name is there in Latin
script (M.J. Dürst at the top, Martin J. Dürst at the end), without an
"ASCII interpretation". And there would be no point to force me to add
an "ASCII interpretation" next time I write an RFC. So please change
"These authors can give their names using only ASCII characters, or as
Unicode characters and an ASCII interpretation of their name." to
"Authors can give their names using only Latin script characters, or
using non-Latin script and an equivalent in Latin script." Please note
that this includes e.g. somebody (fictional) with a name of 加藤 竜太郎 with
a Latin (not ASCII) equivalent of Ryūtarō Katō (if the person prefers
this to the simpler Ryutaro Kato). Please also note that I'm using
"equivalent", not "interpretation". There's no interpretation involved.
Editorial, medium: Please remove "Authors of RFCs whose names include
non-ASCII characters will likely have preferences for how their names
are displayed based on their lived experiences." People, including
authors, just have names.
Content, major: "Company names and geographic names generally do not
need ASCII interpretations, but they can be included at the discretion
of the author and the RPC.": This would mean that I could give my
affiliation as 青山学院大学 and my address as 相模原、日本 or so, but it surely
can't be what we want.
Content, major: RFCs currently use last (family) name plus initial(s) in
many places, and we should change this (as a matter of policy if
necessary). The reason is that there are many people where the family
name isn't very informative. This is very frequent for Koreans, Chinese,
and Danish. It can also happen in other cultures.
Editorial, minor: 3.2 Examples: "giving the Unicode equivalent of the
non-ASCII characters": This is confusing because these characters will
be in UTF-8 and therefore will use Unicode. What we want to say is to
use Unicode code points or Unicode character names.
Editorial, major: When talking about color, the text says "If so, those
examples need to also include the "U+NNNN" syntax.". This excludes the
possibility to use Unicode character names. But as has been discussed in
previous mail, in the example at hand, it would be much more helpful for
the reader to replace 'For example, "A color display should be able to
differentiate 🔴 (U+1F534), 🟢 (U+1F7E2), and 🔵 (U+1F535)."' with 'For
example, "A color display should be able to differentiate 🔴 (LARGE RED
CIRCLE), 🟢 (LARGE GREEN CIRCLE), and 🔵 (LARGE BLUE CIRCLE).", because
it saves somebody with a black-and-white display some lookups.
Content, major: 5. Security: "Valid Unicode that matches the expected
text must be verified in order to preserve expected behavior and
protocol information.": It's totally unclear what this means, and who
should deal with it. Maybe this should read "Authors and the RPC should
cross-check that the used characters match their code point numbers or
Unicode character names." If something else is intended, please make
clearer what it is.
Editorial, minor: The reference label "[UnicodeCurrent]" should be
changed to "[UnicodeLatest]", because that will help people who are
familiar with Unicode terminology. In the reference section, the year
should be removed because that's how the Unicode Consortium advises to
cite the latest version, see e.g. "Version References" at
https://www.unicode.org/versions/Unicode17.0.0/. If the RFC Editor
doesn't allow to remove the year, then at least 2025 should be used
(currently 2023).
Content, minor: "in Normalization Form C (NFC) as defined in
[UnicodeNorm]": I recently learned this by accident, but Unicode
Standard Annex #15 does no longer actually define normalization.
Paragraph 3 of the Introduction says "For the formal specification of
the Unicode Normalization Algorithm, see Section 3.11, Normalization
Forms in [Unicode].". So please change this at least to "in
Normalization Form C (NFC) as defined in Section 3.11, Normalization
Forms, in [UnicodeLatest] and [UnicodeNorm]".
Editorial, minor: For [UnicodeNorm] (if it's kept), change
'The Unicode Consortium, "Unicode Standard Annex", 2023' to
'The Unicode Consortium, "Unicode Standard Annex #15, Unicode
Normalization Forms", 2025'.
Regards, Martin.
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]