My hobby code supports only EBCDIC and UTF-8, avoiding some endian issues
:-)

AFAICT what MS called 'Unicode' was UCS-2 (fixed at 16 bits). I'd add
UTF-32 if I thought it would be any use, but proper UTF-16 (variable width
chars) I can't raise the enthusiasm for :-)

Roops
---
"Mundus sine Caesaribus"

On Wed, 27 Aug 2025, 10:10 Leland Bond, <
00000d7433ac18a9-dmarc-requ...@listserv.uga.edu> wrote:

> On 26 Aug 2025, at 23:53, Phil Smith III <li...@akphs.com> wrote:
> >
> > Without commenting on UTF-EBCDIC, I think I can answer:
> >> What need does UTF-8 address?
> >
> > Fitting the BMP (plus) into as little space as possible. Now, in this
> modren world of large storage devices and high bandwidth, it's not clear
> that UTF-8 is worth the hassle--but it's entrenched, which makes it
> important. Or at least here to stay.
>
> UTF-8 is critically important outside of the EBCDIC enclave since the
> first 128 characters are identical to US-ASCII-7. Compatibility with
> decades of code is critical.
>
> > Personally, I think UTF-16 would make life easier in many, many cases.
>
> Just as ASCII and EBCDIC are too US-centric, UTF-16 is too
> old-European-centric. I rarely find software claiming UTF-16 support that
> correctly supports UTF-16 encoded characters above U+FFFF. Very simply,
> when I see UTF-16, I assume the software involved is broken.
>
> With UTF-32 there is no question about at least accepting the full range
> of Unicode characters. And UTF-32 is fixed-width, so counting characters is
> easy, unlike UTF-8 and UTF-16. Since unlimited storage and bandwidth are
> now available, why bother with UTF-16?  ¡Just use UTF-32!  But if one
> believes in limits, UTF-8 is almost always more compact than UTF-16 and
> UTF-32 while being a compatible superset of US-ASCII-7.
>
> Bringing this discussion back to the z/Architecture instruction set, it
> once seemed unnecessary to me that there were instructions for handling
> UTF-16, such as CUTFU. But I later realized IBM added many of the
> instructions specifically for their JVM. Java strings are UTF-16. Basic
> things like determining the number of characters in a Java string requires
> special processing so I expect most Java applications incorrectly handle
> characters above U+FFFF, such as the characters common to ALL modern
> scripts: emoji. 😀
> (U+1F600)
>
> David
>
> P.S. I am not dumping on Java as a language. All human and programming
> languages have their quirks. I have coded in well over 20 languages, and
> find Java to be far from the worst. The original design choice to use UCS2
> for Java strings and the inability to reform past UTF-16 is by far my
> biggest criticism. But I get how hard it is to change these design choices.
> Python 2 is still heavily used despite being insecure and unsupported
> simply because Python 3 changed strings from ASCII to Unicode and some
> people really don’t like change.

Reply via email to