[Rswg] Re: [Ext] Last call comments on draft-rswg-rfc7997bis

Martin J . Dürst Sun, 02 Nov 2025 18:10:35 -0800

Hello Rob, others,

On 2025-11-03 08:28, Rob Sayre wrote:

On 11/2/25 5:03 AM, Pete Resnick wrote:
On 31 Oct 2025, at 7:57, Martin J. Dürst wrote:
On 2025-10-29 09:33, Paul Hoffman wrote:
On Oct 28, 2025, at 01:35, Martin J. Dürst <[email protected]>wrote:
Content, major: Section 3: "There are many Unicode characters thatobviously cannot be displayed (such as control characters), andmany whose ability to be displayed is debatable.": It's unclearwhat "many whose ability to be displayed is debatable." means. I'dguess it refers to scripts and characters standardized recently,for which font support is still thin. If that's what is meant,please say so; if something else is meant, please make clear whatthat is.
There is a wide variety of things that can be debatable. Arecombining characters like U+0315 (COMBINING COMMA ABOVE RIGHT)displayable? What about non-spacing marks like U+0650 (ARABICKASRA)? I am sure people would take each side of the debate ("I cansee the symbol printed in the Unicode Standard" vs. "I can't seethat code point on my laptop even though it has quite a completefont set" and so on).
On any decent browser, these should display without problems. When itcomes to editors, shells, and the like, the field is much wider, sothere are no absolute guarantees. But these are in Unicode sinceUnicode 1.0 or so, so I would expect these to show.
I will leave it to you and Paul to replace "debatable" with somethingclearer.
Hi,

There is an entire RFC about this, which Paul co-wrote.

https://www.rfc-editor.org/rfc/rfc9839.html

Last time I checked, none of the characters excluded in any of the setsdefined in RFC 9839 had any chance whatsoever to turn up in names ofpeople or companies or places.

What you may be missing is that social networks have character counts,and they sure do go after these issues.
These systems do in fact count a "family" as one character, notmultiples with ZWNJs. Once you understand that, it gets a little cleaner.

I know. At a Unicode Conference many years back, I learned (directlyfrom the person who initiated that change) that Twitter had switchedfrom counting bytes to counting code points, which was the first step inthat direction.

But we are currently not looking at writing policy about lengthrestrictions, so I think this is irrelevant. [It's also irrelevantbecause of the low (=zero?) likeliness of somebody having a familyemoji, or any emoji for that, in their name.]


Regards,    Martin.

I wrote it:
https://github.com/sayrer/twitter-text/blob/main/rust/parser/src/twitter_text.pest#L344
So, having written code that says:

"// Zombies, genies, dancers, and wrestlers"

I am a little tired of these discussions.
But I have it in a coherent (PEST) grammar. The tough problems are URLswith no protocol and languages that do not require whitespace. So, ifyou click that link, look at "URL Without Protocol".
I am down (down desu) to really go after this issue, but it isdifficult. Mine is the best so far, though.
thanks,
Rob


--
rswg mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Rswg] Re: [Ext] Last call comments on draft-rswg-rfc7997bis

Reply via email to