On Fri, Jan 10, 2025 at 09:56:12PM -0700, Brian Inglis wrote: > On 2025-01-09 14:34, Thomas Dickey wrote: > > On Thu, Jan 09, 2025 at 11:15:23AM -0700, Brian Inglis wrote: > > > Hi folks, > > > > > > Many sites are now using Character Entity Names defined under > > > > > > https://www.w3.org/TR/xml-entity-names/ > > > > https://www.w3.org/TR/xml-entity-names/#source > > https://www.w3.org/TR/xml-entity-names/bycodes.html > > https://www.w3.org/TR/xml-entity-names/byalpha.html > > the former is about 184KB, and the latter about 386KB, with a lot of HTML > overhead. > As they have to index character name strings not just codepoint combos, they > probably need about an order of magnitude more space than compose data: > ~50KB source with lots of overhead actually ~8KB.
I see. I'm expecting other issues with zero-width-whatever, but will (after current work on cdk & dialog) see about making a script to extract the data from bycodes.html -- Thomas E. Dickey <[email protected]> https://invisible-island.net
signature.asc
Description: PGP signature
