Hello Pete, everybody,
Two minor points inline.
On 2025-11-02 22:03, Pete Resnick wrote:
On 31 Oct 2025, at 7:57, Martin J. Dürst wrote:
On 2025-10-29 09:33, Paul Hoffman wrote:
On Oct 28, 2025, at 01:35, Martin J. Dürst <[email protected]>
wrote:
Content, major: Section 3: "There are many Unicode characters that
obviously cannot be displayed (such as control characters), and many
whose ability to be displayed is debatable.": It's unclear what
"many whose ability to be displayed is debatable." means. I'd guess
it refers to scripts and characters standardized recently, for which
font support is still thin. If that's what is meant, please say so;
if something else is meant, please make clear what that is.
There is a wide variety of things that can be debatable. Are
combining characters like U+0315 (COMBINING COMMA ABOVE RIGHT)
displayable? What about non-spacing marks like U+0650 (ARABIC KASRA)?
I am sure people would take each side of the debate ("I can see the
symbol printed in the Unicode Standard" vs. "I can't see that code
point on my laptop even though it has quite a complete font set" and
so on).
On any decent browser, these should display without problems. When it
comes to editors, shells, and the like, the field is much wider, so
there are no absolute guarantees. But these are in Unicode since
Unicode 1.0 or so, so I would expect these to show.
I will leave it to you and Paul to replace "debatable" with something
clearer.
I'll gladly contribute to text once I have understood what we want to
say. Is it about formatting charcters such as bidi controls and the
like? Is it about characters added to Unicode very recently?
Content, major (same paragraph): "If an RFC includes such characters
in normative or descriptive text, the RFC needs to also clearly
describe the character.": There may be cases, in particular for the
correct display of examples including bidirectional text in plain
text, where we want to use bidi control characters but we do not
want to "describe" them (because they are not needed in HTML or
PostScript).
But I'm not talking about RTL characters such as Hebrew and Arabic.
I'm talking about BIDI control characters, which are invisible (except
that they may affect how the graphic characters close to them are
ordered. If we need to insert such characters, we shouldn't
necessarily talk about these characters, but about how we expect them
to reorder the rest of the text (so that readers can check whether
they see the text in the order the author expected them to see it).
Chair hat off, a text suggestion: "If an RFC includes such characters in
normative or descriptive text, the RFC needs to also clearly describe
the character or, as in the case of some control characters, describe
the effect of the character."
Good direction. I'd suggest a slight additional tweak:
"If an RFC includes such characters in normative or descriptive text,
the RFC needs to also clearly describe the characters or, as in the case
of some control characters, describe the effect of the characters."
Using the plural here makes it easier to understand that in some cases,
it may be appropriate to describe them as a group, e.g. in their
combined effect, as opposed to requiring character by character
descriptions even if that's not appropriate.
In particular, some authors with Han / Kanji names have asked that
their names be spelled with Latin characters, other have asked for
their names to only be spelled with Han / Kanji, and yet others want
both (often with the Latin of their family name in all caps). These
are preferences that I think should be acknowledged and honored when
sensible, even if bugs some other people.
In general, I agree. Only using Latin should of course be possible.
Only using Han/Kanji (or any other non-Latin script) I think is a big
disservice to the reader, and I'm glad that our current document, as
far as I understand it, disallows this. As for putting the family name
in all caps, I think that's a style issue that should be left to the RPC.
So you're only looking for a change to the first two sentences to say
that all authors, even those who might write their names with non-ASCII
characters in other circumstances, can choose to give their names in
only ASCII characters in an RFC if that is their preference, and if they
choose to use non-ASCII characters, they need to provide an ASCII
interpretation of their name.
My understanding is that the current -05 draft, with "These authors can
give their names using only ASCII characters, or as Unicode characters
and an ASCII interpretation of their name." already includes that.
My main request is to change ASCII to Latin script. The text I'm
proposing is:
"These authors can give their names using only Latin script characters,
or as non-Latin script and a Latin-script equivalent of their name."
I prefer "equivalent" to "interpretation", because for me
"interpretation" invokes something like "oh, the spelling of this name
suggest the author's ancestors may have been of French origin, most
possibly from the nobility". Equivalence just means that it's the same,
in some way (see https://en.wikipedia.org/wiki/Equivalence_relation). In
our case, it's the same if it denotes the same person.
Regards, Martin.
--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]