My hobby code supports only EBCDIC and UTF-8, avoiding some endian issues :-)
AFAICT what MS called 'Unicode' was UCS-2 (fixed at 16 bits). I'd add UTF-32 if I thought it would be any use, but proper UTF-16 (variable width chars) I can't raise the enthusiasm for :-) Roops --- "Mundus sine Caesaribus" On Wed, 27 Aug 2025, 10:10 Leland Bond, < 00000d7433ac18a9-dmarc-requ...@listserv.uga.edu> wrote: > On 26 Aug 2025, at 23:53, Phil Smith III <li...@akphs.com> wrote: > > > > Without commenting on UTF-EBCDIC, I think I can answer: > >> What need does UTF-8 address? > > > > Fitting the BMP (plus) into as little space as possible. Now, in this > modren world of large storage devices and high bandwidth, it's not clear > that UTF-8 is worth the hassle--but it's entrenched, which makes it > important. Or at least here to stay. > > UTF-8 is critically important outside of the EBCDIC enclave since the > first 128 characters are identical to US-ASCII-7. Compatibility with > decades of code is critical. > > > Personally, I think UTF-16 would make life easier in many, many cases. > > Just as ASCII and EBCDIC are too US-centric, UTF-16 is too > old-European-centric. I rarely find software claiming UTF-16 support that > correctly supports UTF-16 encoded characters above U+FFFF. Very simply, > when I see UTF-16, I assume the software involved is broken. > > With UTF-32 there is no question about at least accepting the full range > of Unicode characters. And UTF-32 is fixed-width, so counting characters is > easy, unlike UTF-8 and UTF-16. Since unlimited storage and bandwidth are > now available, why bother with UTF-16? ¡Just use UTF-32! But if one > believes in limits, UTF-8 is almost always more compact than UTF-16 and > UTF-32 while being a compatible superset of US-ASCII-7. > > Bringing this discussion back to the z/Architecture instruction set, it > once seemed unnecessary to me that there were instructions for handling > UTF-16, such as CUTFU. But I later realized IBM added many of the > instructions specifically for their JVM. Java strings are UTF-16. Basic > things like determining the number of characters in a Java string requires > special processing so I expect most Java applications incorrectly handle > characters above U+FFFF, such as the characters common to ALL modern > scripts: emoji. 😀 > (U+1F600) > > David > > P.S. I am not dumping on Java as a language. All human and programming > languages have their quirks. I have coded in well over 20 languages, and > find Java to be far from the worst. The original design choice to use UCS2 > for Java strings and the inability to reform past UTF-16 is by far my > biggest criticism. But I get how hard it is to change these design choices. > Python 2 is still heavily used despite being insecure and unsupported > simply because Python 3 changed strings from ASCII to Unicode and some > people really don’t like change.