On Sun, Nov 2, 2025 at 6:10 PM Martin J. Dürst <[email protected]> wrote:
> Hello Rob, others, > > On 2025-11-03 08:28, Rob Sayre wrote: > > > > > > On 11/2/25 5:03 AM, Pete Resnick wrote: > >> On 31 Oct 2025, at 7:57, Martin J. Dürst wrote: > >> > >>> On 2025-10-29 09:33, Paul Hoffman wrote: > >>>> > >>>> On Oct 28, 2025, at 01:35, Martin J. Dürst <[email protected]> > >>>> wrote: > >>>> > >>>>> Content, major: Section 3: "There are many Unicode characters that > >>>>> obviously cannot be displayed (such as control characters), and > >>>>> many whose ability to be displayed is debatable.": It's unclear > >>>>> what "many whose ability to be displayed is debatable." means. I'd > >>>>> guess it refers to scripts and characters standardized recently, > >>>>> for which font support is still thin. If that's what is meant, > >>>>> please say so; if something else is meant, please make clear what > >>>>> that is. > >>>> > >>>> There is a wide variety of things that can be debatable. Are > >>>> combining characters like U+0315 (COMBINING COMMA ABOVE RIGHT) > >>>> displayable? What about non-spacing marks like U+0650 (ARABIC > >>>> KASRA)? I am sure people would take each side of the debate ("I can > >>>> see the symbol printed in the Unicode Standard" vs. "I can't see > >>>> that code point on my laptop even though it has quite a complete > >>>> font set" and so on). > >>> > >>> On any decent browser, these should display without problems. When it > >>> comes to editors, shells, and the like, the field is much wider, so > >>> there are no absolute guarantees. But these are in Unicode since > >>> Unicode 1.0 or so, so I would expect these to show. > >> > >> I will leave it to you and Paul to replace "debatable" with something > >> clearer. > >> > > > > > > Hi, > > > > There is an entire RFC about this, which Paul co-wrote. > > > > https://www.rfc-editor.org/rfc/rfc9839.html > > Last time I checked, none of the characters excluded in any of the sets > defined in RFC 9839 had any chance whatsoever to turn up in names of > people or companies or places. > > > > What you may be missing is that social networks have character counts, > > and they sure do go after these issues. > > > > These systems do in fact count a "family" as one character, not > > multiples with ZWNJs. Once you understand that, it gets a little cleaner. > > I know. At a Unicode Conference many years back, I learned (directly > from the person who initiated that change) that Twitter had switched > from counting bytes to counting code points, which was the first step in > that direction. > > But we are currently not looking at writing policy about length > restrictions, so I think this is irrelevant. [It's also irrelevant > because of the low (=zero?) likeliness of somebody having a family > emoji, or any emoji for that, in their name.] > You need them in Arabic and Persian (not even the correct name there, but let's carry on). https://www.w3.org/TR/2025/DNOTE-alreq-20251002/ Here, we can go for 4.3.4.1 Disjoining Enforcement or 4.3.4.2 Joining Enforcement or 4.3.4.3 Joining-Disjoining Enforcement I am pretty sure you know this stuff, but most others probably don't. We could use this last name: علیرضا (AliReza) thanks, Rob
-- rswg mailing list -- [email protected] To unsubscribe send an email to [email protected]
